[pull] main from llvm:main #5542

…th (#97353) Make the results of the two IsInteroperableIntrinsicType() utility routines a tri-state std::optional<bool> so that cases where the character length is simply unknown can be distinguished from those cases where the length is known and not acceptable. Use this distinction to not emit a confusing warning about interoperability with C_LOC() arguments when the length is unknown and might well be acceptable during execution.

Support the predefined macro __TIMESTAMP__ as interpreted by GCC. It expands to a character literal with the time of last modification of the top-level source file in asctime(3) format, e.g. "Tue Jul 4 10:18:05 1776".

…ackages (#98420) This reduces Sphinx dependencies for building lldb man pages as lldb man pages don't use markdown.

Section IDs are 64 bit and if a section ID was over 4GB, then the tabular output of the "target modules dump sections" command would not align to the column headers. Also if the section type's name was too long, the output wouldn't algin. This patch fixes this issue. Old output looked like: ``` (lldb) image dump sections a.out Sections for '/tmp/a.out' (arm): SectID Type File Address Perm File Off. File Size Flags Section Name ---------- ---------------- --------------------------------------- ---- ---------- ---------- ---------- ---------------------------- 0xffffffffffffffff container [0x0000000000001000-0x0000000000001010) rw- 0x00000074 0x00000010 0x00000000 a.out.PT_LOAD[0] 0x00000001 data [0x0000000000001000-0x0000000000001010) rw- 0x00000074 0x00000010 0x00000003 a.out.PT_LOAD[0]..data 0xfffffffffffffffe container [0x0000000000001000-0x0000000000001010) rw- 0x00000084 0x00000000 0x00000000 a.out.PT_TLS[0] 0x00000002 zero-fill [0x0000000000001000-0x0000000000001010) rw- 0x00000084 0x00000000 0x00000403 a.out.PT_TLS[0]..tbss 0x00000003 regular --- 0x00000084 0x00000001 0x00000000 a.out..strtab 0x00000004 regular --- 0x00000085 0x0000001f 0x00000000 a.out..shstrtab ``` New output looks like: ``` (lldb) image dump sections a.out Sections for '/tmp/a.out' (arm): SectID Type File Address Perm File Off. File Size Flags Section Name ------------------ ---------------------- --------------------------------------- ---- ---------- ---------- ---------- ---------------------------- 0xffffffffffffffff container [0x0000000000001000-0x0000000000001010) rw- 0x00000074 0x00000010 0x00000000 a.out.PT_LOAD[0] 0x0000000000000001 data [0x0000000000001000-0x0000000000001010) rw- 0x00000074 0x00000010 0x00000003 a.out.PT_LOAD[0]..data 0xfffffffffffffffe container [0x0000000000001000-0x0000000000001010) rw- 0x00000084 0x00000000 0x00000000 a.out.PT_TLS[0] 0x0000000000000002 zero-fill [0x0000000000001000-0x0000000000001010) rw- 0x00000084 0x00000000 0x00000403 a.out.PT_TLS[0]..tbss 0x0000000000000003 regular --- 0x00000084 0x00000001 0x00000000 a.out..strtab 0x0000000000000004 regular --- 0x00000085 0x0000001f 0x00000000 a.out..shstrtab ```

This change is an implementation of #87367 investigation on supporting IEEE math operations as intrinsics. Which was discussed in this RFC: https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294 This change adds constraint intrinsics and some lowering cases for `acos`, `asin`, `atan`, `cosh`, `sinh`, and `tanh`. The only x86 specific change was for f80. #70079 #70080 #70081 #70083 #70084 #95966 The x86 lowering is going to be done in three pr changes with this being the first. A second PR will be put up for Loop Vectorizing and then SLPVectorizer. The constraint intrinsics is also going to be in multiple parts, but just 2. This part covers just the llvm specific changes, part2 will cover clang specifc changes and legalization for backends than have special legalization requirements like aarch64 and wasm.

f18 current emits an error when an assignment is made to an array section with a vector subscript, and the array is finalized with a non-elemental final subroutine. Some other compilers emit this error because (I think) they want variables to only be finalized in place, not by a subroutine call involving copy-in & copy-out of the finalized elements. Since many other Fortran compilers can handle this case, and there's nothing in the standards to preclude it, let's downgrade this error message to a portability warning. This patch got complicated because the API for the WhyNotDefinable() utility routine was such that it would return a message only in error cases, and there was no provision for returning non-fatal messages. It now returns either nothing, a fatal message, or a non-fatal warning message, and all of its call sites have been modified to cope.

) We emit an incorrect error message when !DIR$ IGNORE_TKR appears in a separate module procedure's interface declaration. Fixes #98210.

This patch updates the clang-tidy checks for llvm-libc to ensure that the namespace macro used to declare the libc namespace is updated from LIBC_NAMESPACE to LIBC_NAMESPACE_DECL which by default has hidden visibility. Co-authored-by: Prabhu Rajesakeran <prabhukr@google.com>

Data designators like "a(j:k)" are parsed into array section references, but once rank and type information is in hand, some of them turn out to actually be substring references. The code that recognizes these cases was suffering from a "false positive" in the case of a construct entity in a SELECT RANK construct due to the use of a predicate member function (Symbol::IsObjectArray) that only works on ObjectEntityDetails symbols. Fix the test to use the more general Symbol::Rank() member function.

…98478) This can cause breakage with user code that does "#define A ...".

Doing so allows one side to fold entirely into the mask applied to the other recursive call (or a vmerge.vv at worst). This is a generalization of the existing IsSelect case (both operands are selects), so I removed that code in the process. This actually started as an attempt to remove the IsSelect bit as I'd thought it was fully redundant with the recursive formulation, but digging into test deltas revealed that we depended on that to catch the majority of the identity cases, and that in turn we were missing some cases where only RHS was an identity.

Reverts #98469 We can't add this dependency ``` OBJECT_LIBS RTSanitizerCommon RTSanitizerCommonLibc ``` safestack is security hardening, and RTSanitizerCommon is too fat for that.

Module files emitted by this Fortran compiler are valid Fortran source files. Symbols that are USE-associated into modules are represented in their module files with USE statements and special comments with hash codes in them to ensure that those USE statements resolve to the same modules that were used to build the module when its module file was generated. This scheme prevents unchecked module file growth in large applications by not emitting USE-associated symbols redundantly. This problem can be especially bad when derived type definitions must be repeated in the module files of their clients, and the clients of those modules, and so on. However, this scheme has the disadvantage that clients of modules must be compiled with dependent modules in the module search path. This new -fhermetic-module-files option causes module file output to be free of dependences on any non-intrinsic module files; dependent modules are instead emitted as part of the module file, rather than being USE-associated. It is intended for top level library module files that are shipped with binary libraries when it is not convenient to collect and ship their dependent module files as well. Fixes #97398.

When collecting candidates to pre-compute cost for operands of exit conditions, skip users outside the loop when checking if they are in ExistInstrs. The users outside the loop should be ignored, as they won't make a value live in the VPlan. This fixes a failure when building for X86 with sanitizers on macOS after b841e2e (https://green.lab.llvm.org/job/llvm.org/job/clang-stage2-cmake-RgSan/287/)

Atexit needs to be linked into exit on linux since atexit defines __cxa_finalize. This should probably be fixed a different way but this works for now.

spec: microsoft/hlsl-specs#263 - `Attr.td` - Define the HLSL loop attribute hints (unroll and loop) - `AttrDocs.td` - Add documentation for unroll and loop - `CGLoopInfo.cpp` - Add codegen for HLSL unroll that maps to clang unroll expectations - `ParseStmt.cpp` - For statements if HLSL define DeclSpecAttrs via MaybeParseMicrosoftAttributes - `SemaStmtAttr.cpp` - Add the HLSL loop unroll handeling resolves #70114 dxc examples: - for loop: https://hlsl.godbolt.org/z/8EK6Pa139 - while loop: https://hlsl.godbolt.org/z/ebr5MvEcK - do while: https://hlsl.godbolt.org/z/be8cedoTs Documentation: ![Screenshot_20240531_143000](https://github.com/llvm/llvm-project/assets/1802579/9da9df9b-68a6-49eb-9d4f-e080aa2eff7f)

The names in the `end function` were incorrect. Those have been removed.

This is analogous to `HUGE_VAL`.

GCC emits a warning when using the visibility attribute which needs to be diagnosed and addressed, but this change should unbreak the GCC build as a temporary workaround. The issue is tracked as #98548.

We automatically inject this dependency into all object libraries that use the libc CMake functions, but `libc_diff_test_utils` uses `add_library` so we need to add this dependency manually.

This matches what is done for FreeBSD. OpenBSD has a few special program header types, and other such ELF extensions. Setting the ELFOSABI like so will allow LLD to support them without needlessly impacting non-OpenBSD ELFs. Testing strategy matches 06cecdc. Take two of #98158 / b64c1de, which was reverted in #98494 / c026135. Preexisting test is fixed now.

I made a mistake in #98553. Sorry.

Adding dep to TosaDialect increases binary size unnecessarily

Update availability information added in 1eb7f05. exp10 is available on iOS >= 7.0 and macOS >= 10.9. On all other platforms, it is available on any version. Also drop the x86 check, as the availability only depends on the OS version, not the target platform. PR: #98542

- if there is an object made there is a space after - fixed tests.yaml -- spacing between characters issue

Reapplies #92957, fixing an instance where the `template` keyword was missing prior to a dependent name in `llvm/ADT/ArrayRef.h`. An _alias-declaration_ is used to work around a bug affecting GCC releases before 11.1 (see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94799) which rejects the use of the `template` keyword prior to the _nested-name-specifier_ in the class member access.

MSVC has a __cpuidex function implemented to call the underlying cpuid instruction which accepts a leaf, subleaf, and data array that the output data is written into. This patch adds this functionality into clang under the cpuid.h header. This also makes clang match GCC's behavior. GCC has had __cpuidex in its cpuid.h since 2020. This is another attempt to land https://reviews.llvm.org/D158348.

If requested, via the -memprof-report-hinted-sizes option, track the total profiled size of each MIB through the thin link, then report on the corresponding allocation coldness after all cloning is complete. To save size, a different bitcode record type is used for the allocation info when the option is specified, and the sizes are kept separate from the MIBs in the index.

…#97363) This patch fixes another place in ProfileData where we have a pointer to an array of InstrProfValueData and its length separately. addValueData is a bit unique in that it remaps incoming values in place before adding them to ValueSites. AFAICT, no caller of addValueData uses updated incoming values. With this patch, we add value data to ValueSites first and then remaps values there. This way, we can take ArrayRef<InstrProfValueData> as a parameter.

This test sets a breakpoint on malloc, as a way to stop early in dyld's setting up code, before the system libraries are initialized so we can confirm that we don't fetch the Objective-C class table before it's initialized. In macOS 15 (macOS Sonoma), dyld doesn't call malloc any longer, so this heuristic/trick isn't working. It does call other things called *alloc though, so I'm changing this to use a regex breakpoint on that, to keep the test working.

When implicit data transfer is created, make sure we generate the `freemem` op on the `allocmem` result value and not the declare op value.

This was found by the Clang Static Analyzer.

PrintAddressSpaceLayout can accidentally mmap into the gap.

See the comment in handleTlsRelocation. For TLSDESC=>IE (the TLS symbol is defined in another DSO), R_RISCV_TLSDESC_{LOAD_LO12,ADD_LO12_I,CALL} referencing a non-preemptible label uses the `R_RELAX_TLS_GD_TO_LE` code path. If there is no TLS section, `getTlsTpOffset` will be called with null `Out::tlsPhdr`, leading to a null pointer dereference. Since the return value is used by `RISCV::relocateAlloc` and ignored there, just return 0. LoongArch TLSDESC doesn't use STT_NOTYPE labels. The `if (..) return 0;` is a no-op for LoongArch. This patch is a follow-up to #79239 and fixes some comments. Pull Request: #98569

For synchronous unwind tables, the call frame information can be slightly reduced by bundling the `.cfi_negate_ra_state` instruction with other CFI instructions in the prolog, saving 1 byte per function used for `DW_CFA_advance_loc`. This was suggested in [D156428](https://reviews.llvm.org/D156428#4554317).

Same as X86, , if X's size is BitWidth, then X sdiv 2 can be expressived as ``` X += X >> (BitWidth - 1) X = X >> 1 ``` Fix #97884

@alexander-shaposhnikov

#94322 defines .preinit_array to initialize nsan early. DT_PREINIT_ARRAY can only be used with the main executable. GNU ld would complain when a DSO has .preinit_array. Therefore, nsan_preinit.cpp cannot be linked into `libclang_rt.nsan.so` (#98415). Working with @alexander-shaposhnikov, we noticed that `Nsan-x86_64-Test --gtest_output=json` without `.preinit_array` will sigsegv. This is because googletest with the JSON output calls `localtime_r` , which calls `free(0)` and fails when `REAL(free)` remains uninitialized (nullptr). This is benign with the default output because malloc/free are all paired and `REAL(free)(ptr)` is not called. To fix the unittest failure, `__nsan_init` needs to be called early (.preinit_array). `asan/tests/CMakeLists.txt:ASAN_UNITTEST_INSTRUMENTED_LINK_FLAGS` ues `-fsanitize=address` to ensure `asan_preinit.cpp.o` is linked into the unittest executable. Port the approach and remove `NSAN_TEST_RUNTIME_OBJECTS`. Fix #98523 Pull Request: #98564

And remove spaces around '-' printing ranges.

…unk (#98286) Most of the time when we coalesce and delete a vsetvli, we shrink the LiveInterval of its AVL register now that there is one less use. However there's one edge case we were missing where if we have two vsetvlis with no users of vl or vtype in between, we coalesced a vsetvli without shrinking it's AVL. This fixes it by shrinking the LiveInterval whenever we delete a vsetvli, and also makes the LiveIntervals consistent in-situ by not removing the use before shrinking. This fixes a -verify-machineinstrs assertion in an MIR test case I found while investigating #97264 (comment). I couldn't recreate this at the LLVM IR level, seemingly because RISCVInsertVSETVLI will just avoid inserting extra vsetvlis that don't need coalesced.

…fiers (#98023) Previously, we only pushed the function scope once we entered the function definition, whereas tryCaptureVariable() requires at least one function scope available when ParmVarDecls being captured have been owned by a function. This led to problems parsing the noexcept specifiers, as the DeclRefExprs inside them were improperly computed. Fixes #97453

Extract the logic whether to emit a global var based on CUDA/HIP host/device related attributes to CodeGenModule::shouldEmitCUDAGlobalVar to be used by other places.

memprof_rtl.cpp calls InitializeShadowMemory() - which dynamically/"randomly" chooses a base address for the shadow mapping - prior to InitializeAllocator(). If we are unlucky, the shadow memory may be mapped in the same region where the allocator wants to be. This patch fixes the issue by changing the allocator to dynamically choosing a base address, as suggested by Vitaly. For comparison, HWASan already dynamically chooses the base addresses for the shadow mapping and allocator. The "unlucky" failure was observed on a new buildbot: https://lab.llvm.org/buildbot/#/builders/66/builds/1361/steps/17/logs/stdio --------- Co-authored-by: Vitaly Buka <vitalybuka@gmail.com>

…8427) Fixes #92530.

… CoroBegin (#98546)

…pecific CSRs (#97287) This PR is a follow-up of PR #96174 which added the framework to resolve encoding conflicts among vendor specific CSRs. This PR explicitly enables this only for the RISCV target.

The precedence of `==` operator is superior to `?:` operator. This line is evaluated as: ```cpp assert((ArgTranslations.size() == F->isVarArg()) ? 5 : PassthroughArgSize); ``` I guess this is not what is wanted. This causes a warning with gcc: ``` [131/602] Building CXX object lib/Target/AArch64/CMakeFiles/LLVMAArch64CodeGen.dir/AArch64Arm64ECCallLowering.cpp.o In file included from /usr/include/c++/11/cassert:44, from /home/linux/dev/llvm-project/llvm/include/llvm/Support/CommandLine.h:33, from /home/linux/dev/llvm-project/llvm/lib/Target/AArch64/AArch64Arm64ECCallLowering.cpp:31: /home/linux/dev/llvm-project/llvm/lib/Target/AArch64/AArch64Arm64ECCallLowering.cpp: In member function ‘llvm::Function* {anonymous}::AArch64Arm64ECCallLowering::buildEntryThunk(llvm::Function*)’: /home/linux/dev/llvm-project/llvm/lib/Target/AArch64/AArch64Arm64ECCallLowering.cpp:528:50: warning: ‘?:’ using integer constants in boolean context [-Wint-in-bool-context] 528 | assert(ArgTranslations.size() == F->isVarArg() ? 5 : PassthroughArgSize); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~ ``` Add parenthesis to fix the problem.

… by \" This reverts commit f8b1ca4. It incorrectly replaces `t` in `blt` to `h`. ``` .altmacro .macro gen t blt 2f 2: .endm gen h ``` Fix #98558

Before the patch `fixed-shadow.c` test died with an obscure SEGV, because shadow was mapped over libc.so. Note, FindDynamicShadowStart is expected to select in available region.

This patch adds the following member functions: - User::setOperand() - User::replaceUsesOfWith() - Value::replaceAllUsesWith() - Value::replaceUsesWithIf()

This is a support data structure that acts as a cache for replacer-like functions that map values between two domains. The difference compared to just using a map to cache in-out pairs is that this class is able to handle replacer logic that is self-recursive (and thus may cause infinite recursion in the naive case). This class provides a hook for the user to perform cycle pruning when a cycle is identified, and is able to perform context-sensitive caching so that the replacement result for an input that is part of a pruned cycle can be distinct from the replacement result for the same input when it is not part of a cycle. In addition, this class allows deferring cycle pruning until specific inputs are repeated. This is useful for cases where not all elements in a cycle can perform pruning. The user still must guarantee that at least one element in any given cycle can perform pruning. Even if not, an assertion will eventually be tripped instead of infinite recursion (the run-time is linearly bounded by the maximum cycle length of its input).

In the .altmacro mode, an argument can be expanded even if not preceded by \. This behavior is not enabled for Darwin, which uses $ (`isIdentifierChar('$')` is true) for macro expansion. This is f8b1ca4 with a fix.

…ribute. NFC Remove mention of sharing with new and old PM. The old PM code is gone.

Argument promotion doesn't handle recursive function calls to promote arguments. This patch adds functionality to handle self recursive function calls, i.e. whose SCC size is 1. Due to complexity of ValueTracking in recursive calls with SCC size greater than 1, we bail out in such cases.

Co-authored-by: vporpo <vporpodas@google.com>

This patch introduces DeclBase::isInNamedModule API to ease the use of modules slightly.

This patch updates the function `getReductionPatternCost` to handle the cost of min/max reductions by `TTI.getMinMaxReductionCost`.

Close #98583 Currently, clang will reject the following code: ``` export module mod; extern "C++" void func(); export extern "C++" { void func(); } ``` while both MSVC and GCC accept it. Although clang's behavior matches the current wording, from the discussion, the consensus is that we should accept the above example from the intention. Since the intention to not allow export redeclaration which is not exported is to make the linkage clear. But it doesn't matter with the declarations within global module.

This patch fixes: compiler-rt/lib/hwasan/hwasan_report.cpp:331:57: error: format specifies type 'void *' but the argument has type 'const uptr *' (aka 'const unsigned long *') [-Werror,-Wformat-pedantic]

…for variable llvm.memcpy" (#98482) Reverts #98295, which reverted #97998 The failure in the "InOneWeekend" test of the HIP test suite on clang-hip-vega20 (https://lab.llvm.org/buildbot/#/builders/123/builds/1498) seems to be unrelated; I observed it (and a similar failure for the "TheNextWeek" test in the same suite) intermittently on my system, with and without the patch applied. (It occurred in 2 out of 50 repeated runs without the patch and in 1 out of 50 runs with the patch.)

This patch adds a check for the correct number of `loops` results of the `transform.structured.tile_using_for` Op to the verifier, fixing a crash. Fix #98008

…laration" (#98593) Reverts #98075 bots are broken

Patch [3/x] to fix structured bindings debug info in SROA. This function computes a fragment, bit-extract operation if needed, and new constant offset to describe a part of a variable covered by some memory. This generalises, simplifies, and replaces at::calculateFragmentIntersect. That version is still used as a wrapper for now though to keep this change NFC. The new version takes doesn't have a DbgRecord parameter, instead using an explicit address and address offset. The old version only operates on dbg_assigns and this change means it can also operate on dbg_declare records easily, which it will do in a subsequent patch. The new version has a new out-param OffsetFromLocationInBits which is set to the difference between the first bit of the variable location and the first bit of the memory slice. This will be used in a subsequent patch in SROA to determine the new offset to use in the address expression after splitting an alloca.

- Add `MachineBlockFrequencyAnalysis`. - Add `MachineBlockFrequencyPrinterPass`. - Use `MachineBlockFrequencyInfoWrapperPass` in legacy pass manager. - `LazyMachineBlockFrequencyInfo::print` is empty, drop it due to new pass manager migration.

https://llvm.org/docs/CodingStandards.html tells us that we should avoid evaluating `.end()` each time if possible.

Missing `-mtriple` in test, causes failure on some none x86 default targets.

…CE_DIR is set (#98464) This was found during testing of llvm snapshots for Fedora. This test was looking for an exact string match of the path calculated by starting with lib/liblldb and with bin/lldb. However when CLANG_RESOURCE_DIR is set to something e.g. "../lib/clang/19", the way the initial path is handled is different. Instead of taking the parent of the parent of the binary, that is foo/bin/lldb/ -> foo/, it uses the parent of the binary and appends CLANG_RESOURCE_DIR to that. As CLANG_RESOURCE_DIR is defined as being a path relative to the parent dir's of the clang binary. This means that if you start with foo/lib/lidblldb the resulting path is lib/../lib/clang/19, but if you start with bin/lldb the result is bin/../lib/clang/19. I don't want to change the starting path of DefaultComputeClangResourceDirectory (which is bin/lldb) as I suspect that's chosen instead of liblldb for good reason. So the way to make this test work is to check not for exact path matches but that the "real" (".." resolved) version of the paths are the same. That way foo/bin/../lib and foo/lib/../lib will be the same.

…6218) Restrict `DropInnerMostUnitDimsTransfer{Read|Write}` so that it fails when one of the indices to be dropped could be != 0 and "out of bounds": ```mlir func.func @negative_example(%arg0: memref<16x1xf32>, %arg1: vector<8x1xf32>, %idx_1: index, %idx_2: index) { vector.transfer_write %arg1, %arg0[%idx_1, %idx_2] {in_bounds = [true, false]} : vector<8x1xf32>, memref<16x1xf32> return } ``` This is an edge case that could represent an out-of-bounds access, though that will depend on the actual value of %i. Importantly, without this change it would be transformed as follows: ```mlir func.func @negative_example(%arg0: memref<16x1xf32>, %arg1: vector<8x1xf32>, %arg2: index, %arg3: index) { %subview = memref.subview %arg0[0, 0] [16, 1] [1, 1] : memref<16x1xf32> to memref<16xf32, strided<[1]>> %0 = vector.shape_cast %arg1 : vector<8x1xf32> to vector<8xf32> vector.transfer_write %0, %subview[%arg2] {in_bounds = [true]} : vector<8xf32>, memref<16xf32, strided<[1]>> return } ``` This is incorrect - `%idx_2` is ignored and the "out of bounds" flags is not propagated. Hence the extra restriction to avoid such cases. NOTE: This is a follow-up for: #94904

…rict DWARF v2 mode (#98335) During testing of #96202 we found that when clang set to DWARF v2 was used to build the test file, lldb could not tell that the unsigned enum type was in fact unsigned. So it defaulted to signed and printed the wrong value. The reason for this is that DWARFv2 does not include DW_AT_type in DW_TAG_enumeration_type. This was added in DWARF v3: "The enumeration type entry may also have a DW_AT_type attribute which refers to the underlying data type used to implement the enumeration. In C or C++, the underlying type will be the appropriate integral type determined by the compiler from the properties of the enumeration literal values." I noticed that gcc does emit this attribute for DWARF v2 but not when strict DWARF is requested (more details in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=16063#c7). This patch changes to clang to do the same. This will improve the experience of anyone using tools that can understand the attribute but for whatever reason are stuck building binaries containing v2 only. You can see a current clang/gcc comparison here: https://godbolt.org/z/eG9Kc9WGf https://reviews.llvm.org/D42734 added the original code that emitted this for >= v3 only.

To reduce build times move them to TargetCodeGenInfo. Refactor of #98329

Use Python's builtin enum class instead of writing our own. This is preparation for passing a strict type check in PR #78114 , fixing 927 out of 1341 strict typing errors --------- Co-authored-by: Jannick Kremer <jannick-kremer@gmx.de> Co-authored-by: Vlad Serebrennikov <serebrennikov.vladislav@gmail.com>

… logical shift nodes #83840 (#86922) Resolve #83840

This shares code with WsloopOp (the changes to Wsloop should be NFC). OpenMPIRBuilder basically implements SECTIONS as a wsloop over a case statement with each SECTION as a case for a particular loopiv value. Unfortunately it proved very difficult to share code between these and ParallelOp. ParallelOp does quite a few things differently (doing more work inside of the bodygen callback and laying out blocks differently). Aligning reduction implementations for wsloop and parallel will probably involve functional changes to both, so I won't attempt that in this commit.

The tricky bit here is that we need to generate the reduction symbol mapping inside each of the nested SECTION constructs. This is a bit similar to omp.canonical_loop inside of omp.wsloop, except the SECTION constructs come from the PFT. To make this work I moved the lowering of the SECTION constructs inside of the lowering SECTIONS (where reduction information is still available). This subverts the normal control flow for OpenMP lowering a bit. One alternative option I investigated would be to generate the SECTION CONSTRUCTS as normal as though there were no reduction, and then to fix them up after control returns back to genSectionsOp. The problem here is that the code generated for the section body has the wrong symbol mapping for the reduction variable, so all of the nested code has to be patched up. In my prototype version this was even more hacky than what the solution I settled upon.

Enable flag -Wmissing-format-attribute to catch missing attributes. Fixes #60718

The caching mechanism for 'OpIsSafeForPhiOfOps' is unsound. An operand is deemed unsafe for PhiOfOps if it depends on a phi that resides in the same block as the Phi block, i.e., where we are performing the PhiOfOps. This is to avoid having to materialize the translated subexpressions. To avoid redundant code walking, a cache is used to store these results. Note, however, that since the safety is specific to the Phi block, we cannot, in general, use the cached results for other blocks. This patch addresses this by having a cache per block instead of a single one for the entire function. closes #63335

…ructor" (#98612) Reverts #97337 This has caused llvm test suite failures on our bots, for example: https://lab.llvm.org/buildbot/#/builders/17/builds/709 ``` FAIL: test-suite::gfortran-regression-execute-regression__char_length_21_f90.test FAIL: test-suite::gfortran-regression-execute-regression__char_length_20_f90.test ```

…ernMatch. NFC. First tentative attempt to use SDPatternMatch for x86 combine matching - main problem so far is namespace clashing when trying to expose llvm::SDPatternMatch to the entire file.

This paper had several changes within it, and Clang implements some of the changes, but not others. This updates the status accordingly.

This is a revert of ef5e7f9 which was a temporary partial revert of 77ac823. The le32 and le64 targets are no longer necessary to retain, so this removes them entirely.

Reverts #70024 It broke several post-commit bots: https://lab.llvm.org/buildbot/#/builders/193/builds/896 https://lab.llvm.org/buildbot/#/builders/23/builds/925 https://lab.llvm.org/buildbot/#/builders/13/builds/686 and others

This paper is about whether a copy of a va_list object which was not produced by calling va_copy is valid to use or not. While this may work on some targets, we explicitly document it as undefined behavior for all targets so there's not confusion as to when it's valid or not. It's not a burden for a user to use va_copy explicitly.

A type-requirement cannot be an operator-function-id Fixes #51868

…rd diagnostic (#98613) After commit ce4aada, we observed many warnings in our internal codebase. It is infeasible to fix all at once. Currently, there is no way to disable this warning. This patch provides a way to disable it using the `-Wno-missing-dependent-template-keyword` flag.

@c01db33f

This change adds a new weak API function which makes the sanitizer ignore the call to free(), and implements the functionality in ASan and HWAsan. The runtime that implements this hook can then call free() at a later point again on the same pointer (and making sure the hook returns zero so that the memory will actually be freed) when it's actually ready for the memory to be cleaned up. This is needed in order to implement an sanitizer-compatible version of Chrome's BackupRefPtr algorithm, since process-wide double-shimming of malloc/free does not work on some platforms. Requested and designed by @c01db33f (Mark) from Project Zero. --------- Co-authored-by: Mark Brand <markbrand@google.com>

Since #98335 clang adds DW_AT_type, unless strict DWARF is requested.

…tion` NFC

Should check host/device attributes before emitting static member of template instantiation. Fixes: #98151

When the `add_flang_library` was first added, it was apparently copied over from `add_clang_library`, including its logic to determine the library type. It includes a workaround: If `BUILD_SHARED_LIBS` is enabled, it should build all libraries as shared, including those that are explicitly marked as `STATIC`[^1], because `add_clang_library` always passes at least one of `STATIC`/`SHARED` to `llvm_add_library`, and `llvm_add_library` could not distinguish the two cases. Then, the two implementations diverged. For its runtime libraries, Flang requires some libraries to always be static libraries, so if a library is explicitly marked as `STATIC`, `BUILD_SHARED_LIBS` is ignored[^2]. I noticed the two implementations of the same functionality, modified only the `add_clang_library`, and copied over the result to `add_flang_library`[^3], without noticing that they are slightly different. As a result, Flang runtime libraries would be built as shared libraries with `-DBUILD_SHARED_LIBS=ON`, which may break some build configurations[^4]. This PR fixes the problem and at the same time simplifies the library type algorithm by just passing SHARED/STATIC verbatim to `llvm_add_library`. This is effectively what [^2] should have done instead adding more code to undo the workaround of [^1]. Ideally, one would use ``` llvm_add_library(${name} ${ARG_STATIC} ${ARG_SHARED} [...]) ``` but `ARG_STATIC`/`ARG_SHARED` as set by `cmake_parse_arguments` contain `TRUE`/`FALSE` instead of the keywords themselves. I could imagine a utility function akin to `pythonize_bool` that does this. This simplification adds two more changes: 1. Object libraries are not explicitly requested anymore. `llvm_add_library` itself should determine whether an object library is necessary. As the comment notes, using an object library is not without problems and seem of no use here since it works fine without object library when in `XCODE`/`MSVC_IDE` mode. 2. The property `CLANG_STATIC_LIBS` was removed. It was `FLANG_STATIC_LIBS` before to copy&paste error of #93519 [^3] which not used anywhere. In clang, `CLANG_STATIC_LIBS` is used for `clang-shlib` to include all component libraries in a single large library. There is no equivalent `flang-shlib`. [^1]: dbc2a12 [^2]: 3d2e05d [^3]: #93519 [^4]: #93519 (comment)

From #98481.

This patch contains a number of small portability improvements for the test suite, making it easier to run the test suite with other standard library implementations. - Guard checks for _LIBCPP_HARDENING_MODE to avoid -Wundef - Avoid defining _LIBCPP_HARDENING_MODE even when no hardening mode is specified -- we should use the default mode of the library in that case. - Add missing includes and qualify a few function calls. - Avoid opening namespace std to forward declare stdlib containers. The test suite should represent user code, and user code isn't allowed to do that.

…) -> avx2 shift(x,amt) We need to catch this otherwise pre-AVX512 targets will fold this to shift_logical(and(icmp_ult(amt,BW),x),amt)

…, C1). NFC Add test where the zext has an additional use, but the entire expression can be replaced with (zext X). Folding even though there is an additional use would not increase the number of instructions.

We're currently handling a special case of ptrtoint gep -> add ptrtoint. Reframe the code to make it easier to add more patterns for this transform.

PT_MemberPtr also needs its ctor/dtor called, so add that. However, this exposed a problem in initializing virtual bases, so fix that as well.

…dditional use. (#98533) We have a general fold for (zext (X +nuw C2)) + C1 --> zext (X + (C2 + trunc(C1))) but this fold is disabled if the zext has an additional use. If the two constants cancel, we can fold the whole expression to zext(X) without increasing the number of instructions.

Try to keep the indentation width lower here.

This is an editorial change, so no tests were added

…nt (#98550) Textual strings for architecture feature flags have not been consistently written, so there are a wide variety of styles, capitalisation, etc. and some are missing information. I have tidied this up mechanically for AArch64, so that the output of `--print-supported-extensions` looks much more consistent, since it's user-visible.

This reverts commit c45f939. This refactoring turned out to not be useful for the case I had originally in mind, so revert it for now.

…est (#97622) fixes #95944

…n. (#87190) Fixes #85560. We can forward `memcpy` as long as the actual memory location being copied have not been altered. alive2: https://alive2.llvm.org/ce/z/q9JaHV

…ues (#95554) Summary: We create the builtins separately because these are required to set up before others are built. It's configured with a default value if not specified, but this doesn't respect the runtimes from other targets. This patch changes the behavior to prepopulate the builtins list with all the targets that have `compiler-rt` enabled if not overridden by the user.

When building compiler-rt with MSVC, CMAKE_INSTALL_LIBDIR and CMAKE_INSTALL_BINDIR are empty. This causes error in Findzstd.cmake like the following: CMake Error at C:/llvm/cmake/modules/Findzstd.cmake:39 (string): string sub-command REGEX, mode REPLACE: regex "$" matched an empty string. Do not do the REGEX when CMAKE_INSTALL_LIBDIR and CMAKE_INSTALL_BINDIR are empty. Similar issues were reported by others at e7fc754

… already. If the instruction was processed already for the deletion, no need to process it second time, it may cause compiler crash.

Remove unused includes and don't use an else after a return.

The == 0 check here was used before blocks had metadata, but doesn't work anymore today.

Sometimes, isDefined() returns true, even though the function doesn't have a body yet, but will have one later. This is for example the case when referring to a class member function via a member pointer before the member function has been fully parsed. Reject them at first and compile them later.

…8573)

The current `AttrTypeReplacer` does not allow for custom handling of replacer functions that may cause self-recursion. For example, the replacement of one attr/type may depend on the replacement of another attr/type (by calling into the replacer manually again), which in turn may depend on the replacement of the original attr/type. To enable this functionality, this PR broke out the original AttrTypeReplacer into two parts: - An uncached base version (`detail::AttrTypeReplacerBase`) that allows registering replacer functions and has logic for invoking it on attr/types & their sub-elements - A cached version (`AttrTypeReplacer`) that provides the same caching as the original one. This is still the one used everywhere and behavior is unchanged. On top of the uncached base version, a `CyclicAttrTypeReplacer` is introduced that provides caching & cycle-handling for replacer logic that is cyclic. Cycle-breaking & caching is provided by the `CyclicReplacerCache` from #98202. Both concrete implementations of the uncached base version use CRTP to avoid dynamic dispatch. The base class merely provides replacer registration & invocation, and is not meant to be used, or otherwise extended elsewhere.

) Use the new CyclicReplacerCache from #98202 to support importing of recursive DITypes in LLVM dialect's DebugImporter. This helps simplify the implementation, allows for separate testing of the cache infra itself, and as a result we even got more efficient translations.

…#98597) This is a part of #97655.

…e with ineligebile defaulted overloads" (#97002) (#97894) This reverts commit 567b2c6.

It's required for some test cases, but off by default on some platforms. Follow up to #96749.

We don't have any legacy pass manager CGSCC passes that modify the call graph (we only use it in the codegen pipeline to run function passes in call graph order). This is the beginning of removing CallGraphUpdater and making all the relevant CGSCC passes directly use the new pass manager APIs.

If we have something we don't know what it is, we should conservatively avoid printing an additional suffix. For isCodeGenOnly pseudoinstructions, no encoded instruction is added to the tables this is queried, and the null case would assume true. This happens to fix the case I ran into, but this isn't a wholistic fix. These really should be encoded directly in the TSFlags of the MCInstrDesc, which would allow encoding pseudos to work correctly.

PR adds changes to the flang frontend to create the `MaskedOp` when `masked` directive is used in the input program. Omp masked is introduced in 5.2 standard and allows a parallel region to be executed by threads specified by a programmer. This is achieved with the help of filter clause which helps to specify thread id expected to execute the region. Other related PRs: - [Fortran Parsing and Semantic Support](#91432) - Merged - [MLIR Support](https://github.com/llvm/llvm-project/pull/96022/files) - Merged - [Lowering Support](#98401) - Under Review

Instead of #ifdef guards for each individual function, #ifdef and #endif will surround all functions that have the same guard.

In RegisterContextUnwind::SavedLocationForRegister we have special logic for retrieving the Return Address register when it has the caller's return address in it. An example would be the lr register on AArch64. This register is never retrieved from a newer stack frame because it is necessarly overwritten by a normal ABI function call. We allow frame 0 to provide its lr value to get the caller's return address, if it has not been overwritten/saved to stack yet. When a function is interrupted asynchronously by a POSIX signal (sigtramp), or a fault handler more generally, the sigtramp/fault handler has the entire register context available. In this situation, if the fault handler is frame 0, the function that was async interrupted is frame 1 and frame 2's return address may still be stored in lr. We need to get the lr value for frame 1 from the fault handler in frame 0, to get the return address for frame 2. Without this fix, a frameless function that faults in a firmware environment (that's where we've seen this issue most commonly) hasn't spilled lr to stack, so we need to retrieve it from the fault handler's full-register-context to find the caller of the frameless function that faulted. It's an unsurprising fix, all of the work was finding exactly where in RegisterContextUnwind we were only allowing RA register use for frame 0, when it should have been frame 0 or above a fault handler function. rdar://127518945

This adds the ability to erase a value from a blockstore based on an iterator. For usability/testing purposes it also includes an addition operator for blockstore's iterator.

Reverts #97641 Fails under sanitizers

The formatting looked a little off in the Release notes webpage. This should address those issues

…ion (#98079) Ignore assignments in RequiresExpr, to avoid false positives. Fixes #97972

This was accidentally introduced in #98597.

Summary: The target information needs to configure that the platform has a maximum integer size of 64 in order for it to enable i128 support. The motivation behind this patch is that the i128 libcalls seem to be the only ones used by the NVPTX backend and it would be ideal to disable those completely. That would allow LTO to optimize libcalls properly after #98512.

We can use an internal linkage variable to make it clear the variable is not exported. The special section .preinit_array is a GC root. Pull Request: #98584

- Add support for `.openbsd.mutable` (rebaser's note) adapted from: openbsd/src@bd249b5 New auto-coalescing sections removed In the linkers, collect objects in section "openbsd.mutable" and place them into a page-aligned region in the bss, with the right markers for kernel/ld.so to identify the region and skip making it immutable. While here, fix readelf/objdump versions to show all of this. ok miod kettenis - Add support for `.openbsd.syscalls` (rebaser's note) adapted from: openbsd/src@42a61ac Collect .openbsd.syscalls sections into a new PT_OPENBSD_SYSCALLS segment. This will be used soon to pin system calls to designated call sites. ok deraadt@ - Scope OpenBSD special section handling under that ELFOSABI As a preexisting comment in `ELF/Writer.cpp` says: > section names shouldn't be significant in ELF in spirit. so scoping OSABI-specific magic name hacks to just the OSABI in question limits the degree to which we deviate from that "spirit" for all other OSABIs. OpenBSD in particular is very fast moving, having added a number of special sections, etc. in recent years. It is unclear how possible / reasonable it is for upstream to implement all these features in any event, but scoping like this at least mitigates the fallout for other OSABIs systems which wish to be more slow-moving. Co-authored-by: deraadt <deraadt@openbsd.org>

…p`; NFC These matchers either take no predicate argument or match a specific predicate respectively. We have a lot of cases where the Pred argument is either unused and requiring the argument reduces code clarity. Likewise we have a lot of cases where we only pass in Pred to test equality which the new `*Specific*` helpers can simplify. Closes #98282

…cmp eq/ne X,Pow2OrZero))`; NFC

…e X,Pow2OrZero))` `(or (icmp eq X, 0), (icmp eq X, Pow2OrZero))` --> `(icmp eq (and X, Pow2OrZero), X)` `(and (icmp ne X, 0), (icmp ne X, Pow2OrZero))` --> `(icmp ne (and X, Pow2OrZero), X)` Proofs: https://alive2.llvm.org/ce/z/nPo2BN Closes #94648

Seemingly, #96256 removed the only call to Platform::GetCachedExecutable, which broke the resolution of executable modules in the remote debugging mode (#97410). This commit fixes that.

…s. (#83277) These attributes are no longer inherited from the module flags, therefore need to be added for synthetic functions.

…ever (#97907) This patch adds an explicit timeout mechanism in the compare-and-wait test for std::atomic, ensuring that it doesn't run forever when the bug is present. This is not an issue when we run inside the CI because we specify a timeout manually, but it can be a problem when running locally, for example.

…84870) Keep falling back to `__builtin_trap` on older versions of Clang. Co-authored-by: Louis Dionne <ldionne.2@gmail.com>

Pull request #97337 was reverted by #98612 due to two failing tests in llvm-test-suite -- which I ran, as always, but must have bungled or misinterpreted (mea culpa). The failing tests were llvm-test-suite/Fortran/gfortran/regression/ char_length_{20,21}.f90. They have array constructors with explicit character types whose dynamic length values are negative at runtime, which must be interpreted as zero. This patch extends the original to cover those cases.

Parsing the new input file's symbols might invalidate LTO codegen, but the semantics of deplibs require them to be parsed. Accordingly, report an error unless the file had already been added to the link. Fixes #56070

…es. (#76612) This patch extends Clang's TBAA generation code to emit distinct tags for incompatible pointer types. Pointers with different element types are incompatible if the pointee types are also incompatible (modulo sugar/modifiers). Express this in TBAA by generating different tags for pointers based on the pointer depth and pointee type. To get the TBAA tag for the pointee type it uses getTypeInfoHelper on the pointee type. (Moved from https://reviews.llvm.org/D122573) PR: #76612

…g in libcxx/include

In LLVM 19, the old xxx_ENABLE_ASSERTIONS settings should be deprecated with the goal of removing them entirely in LLVM 20.

Reland of #97641 with sanitizer fixes This adds the ability to erase a value from a blockstore based on an iterator. For usability/testing purposes it also includes an addition operator for blockstore's iterator.

The ability to spell out and specify the resource class is necessary for testing various resource binding behaviors. Though it is not intended for users to use this in customized HLSL source code, the ability to specify the resource class via an attribute is immensely helpful for writing thorough tests. This PR introduces a new attribute, hlsl::resource_attribute, that can only be applied on structs. This attribute only has 1 required argument, and must be one of: ``` SRV UAV CBuffer Sampler ``` By applying this attribute to a struct, the struct will have the `HLSLResourceClassAttr` attribute attached to it in the AST representation, which provides information on the type of resource class the struct is meant to be. The resource class data that was originally contained within the `HLSLResourceAttr` attribute has been removed in favor of this new attribute, and so certain ast-dump tests need to be modified so that the same information can be represented via 2 attributes instead of one. Fixes #98193 --------- Co-authored-by: Damyan Pepper <damyanp@microsoft.com>

…r] "Nothing To Do" (#98636) The changes of https://wg21.link/p1614r2 in [meta.trans.other] were exactly reverted by https://wg21.link/LWG3380.

…75242) Previously we would ignore all undefined symbols when using `-shared` or `-pie`. All undefined symbols would be treated as imports regardless of whether those symbols we defined in any shared library. With this change we now track symbol in shared libraries and report undefined symbols in the main program by default. The old behavior is still available via the `--unresolved-symbols=import-dynamic` command line flag. This rationale for allowing this type of breaking change is that `-pie` and `-shared` are both still experimental will warn as such, unless `--experimental-pic` is passed. As part of this change the linker now models shared library symbols via new SharedFunctionSymbol and SharedDataSymbol types. I've also added a new `--no-shlib-sigcheck` option that bypassed the checking of functions signature in shared libraries. This is specifically required by emscripten the case where the imports/exports of shared libraries have been modified by via JS type legalization (this is only needed when targeting old JS engines where bigint is not yet available See emscripten-core/emscripten#18198

…MemCpyDependence` (#98686) Fixes #98675.

…pths,types. (#76612)" This reverts commit 038c48c. This is causing test failures in some configurations, reverted while I investigate. Failures include http://lab.llvm.org/buildbot/#/builders/11/builds/1623 http://lab.llvm.org/buildbot/#/builders/108/builds/1172

…perands being scalars and result being a 1-element vector during scalarization (#98687) This patch fixes a problem that existed before where in some situations a `UCMP`/`SCMP` node which operated on 1-element vectors had a legal result type (i.e. `v1i64` on AArch64), but illegal operands (i.e. `v1i65`). This meant that operand scalarization was performed on the node and the operands were changed to a legal scalar type, but the result wasn't. This then led to `UCMP`/`SCMP` nodes with different vector-ness of operands and result appearing in the SDAG. This patch addresses this issue by fully scalarizing the `UCMP`/`SCMP` node and then turning its result back into a 1-element vector using a `SCALAR_TO_VECTOR` node. It also adds several assertions to `SelectionDAG::getNode()` to avoid this or a similar issue arising in the future. I wasn't sure if these two changes are unrelated enough to warrant two small separate PRs, but I'm happy to split this PR into two if that's deemed more appropriate.

Summary: This patch explicitly disables runtime calls to be emitted from the NVPTX backend. This allows other utilities to know that we do not need to worry about emitting these.

armv8m builtins aren't being built because `armv8m` doesn't match any of the arm cpu strings in compiler-rt cmake files. Instead there's `armv8m.main` and `armv8m.base`. We want to use the `armv8m.main` version.

Summary: This patch implements support for variadic functions for NVPTX targets. The implementation here mainly follows what was done to implement it for AMDGPU in #93362. We change the NVPTX codegen to lower all variadic arguments to functions by-value. This creates a flattened set of arguments that the IR lowering pass converts into a struct with the proper alignment. The behavior of this function was determined by iteratively checking what the NVCC copmiler generates for its output. See examples like https://godbolt.org/z/KavfTGY93. I have noted the main methods that NVIDIA uses to lower variadic functions. 1. All arguments are passed in a pointer to aggregate. 2. The minimum alignment for a plain argument is 4 bytes. 3. Alignment is dictated by the underlying type 4. Structs are flattened and do not have their alignment changed. 5. NVPTX never passes any arguments indirectly, even very large ones. This patch passes the tests in the `libc` project currently, including support for `sprintf`.

…ate updates to CMake file for sys folder (#98693) Moved sys yaml files into the sys folder. After CMake patch lands, will make appropriate changes to account for yaml, header, and .h.def files that are located within the sys folder in a separate patch.

For DXIL which is based on llvm 3.7, max supported behavior flag for module flags is 6. The commit will check all module flags, for behavior flag > 6, change it to 2 (Warning). This is to fix the behavior flag part for #96912.

…sic blocks involved in critical edge splitting (#98540) Fix an issue in #97618 - if the two basic blocks involved are not predecessor / successor to each other, treat the candidate as illegal for critical edge splitting. Closes #98477 (checked in test copied from its comment).

With ptrauth-calls, function pointers are supposed to be signed. On Darwin that includes the TLS indirection accessor (`_tlv_get_addr`). We simply sign it with the plain function-pointer schema (IA,0), which lets us do a `blraaz` when calling it. Note that this doesn't have any kind of diversity, even when function pointer diversity is enabled in the frontend. On arm64e this accessor is never signed that way, but the obvious alternative where this (or another backend-generated) function pointer needs to be diversified would need more than the "ptrauth-calls" attribute as it exists today.

Summary: This patch implements the `printf` family of functions on the GPU using the new variadic support. This patch adapts the old handling in the `rpc_fprintf` placeholder, but adds an extra RPC call to get the size of the buffer to copy. This prevents the GPU from needing to parse the string. While it's theoretically possible for the pass to know the size of the struct, it's prohibitively difficult to do while maintaining ABI compatibility with NVIDIA's varargs. Depends on #96015.

@SchrodingerZhu

Summary: These cause issues because we compile with `-Wno-error`. Remove them for now. @SchrodingerZhu.

Add hexagon to detect_target_arch, test_target macros.

Introduced with #98282

…straintSatisfaction (#98654) This expression doesn't appear to be ever used, so let's remove it from the data structure. Fixed some spelling issues as well.

suppress uninitialized werrors when building with gcc

This does not really serve any purpose nowadays.

…95969) Constructs like `__is_pointer(Foo)` are never considered to be functions declarations. This matches usages in libstdc++, and we can hope no one else redefine these reserved identifiers. Fixes #95598

…nitExpr. (#98490) These are not "original initializers"; the single node underneath represents the initializing node.

Since Linux 4.7, RLIMIT_DATA may result in mmap() returning ENOMEM. Example: $ clang -fsanitize=address -o hello hello.c $ ulimit -d 100000 $ ./hello ==3349007==ERROR: AddressSanitizer failed to allocate 0x10000000 (268435456) bytes at address 7fff7000 (errno: 12) ==3349007==ReserveShadowMemoryRange failed while trying to map 0x10000000 bytes. Perhaps you're using ulimit -v Suggest checking ulimit -d in addition to ulimit -v.

…or messages (#98626) "Can't open file:" and "Can't create directory:" are lacking a newline.

…lt (#97604) Fix various problems to do with the first active lane of the result of optimized fp atomics, as explained in the comment. Fixes #97554

Summary: This causes errors when running unit tests when it tries to use an invalid stdio handle. Fixes #98711

This adds an `emitc.member` and `emitc.member_of_ptr` operation for the corresponding member access operators. Furthermore, `emitc.assign` is adjusted to be used with the member access operators.

`linalg.matmul` already has an attribute for casts, defaults to signed but allowed unsigned, so the operation `linalg.matmul_unsigned` is redundant. The generalization test has an example on how to lower to unsigned matmul in linalg. This is the first PR in a list of many that will simplify the linalg operations by using similar attributes. Ref: https://discourse.llvm.org/t/rfc-transpose-attribute-for-linalg-matmul-operations/80092

before free()ing the dead blocks. Otherwise, we might end up with dangling Pointers to those dead blocks.

The new transformation folds `umin(cttz(x), c)` to `cttz(x | (1 << c))` and `umin(ctlz(x), c)` to `ctlz(x | ((1 << (bitwidth - 1)) >> c))`. The transformation is only implemented for constant `c` to not increase the number of instructions. The idea of the transformation is to set the c-th lowest (for `cttz`) or highest (for `ctlz`) bit in the operand. In this way, the `cttz` or `ctlz` instruction always returns at most `c`. Alive2 proofs: https://alive2.llvm.org/ce/z/y8Hdb8 Fixes #90000

All of these insert freeze due to multi-use, which is only relevant for undef values, not poison.

This PR reworks HLSL's implicit conversion sequences. Initially I was seeking to match DXC's behavior more closely, but that was leading to a pile of special case rules to tie-break ambiguous cases that should really be left as ambiguous. We've decided that we're going to break compatibility with DXC here, and we may port this new behavior over to DXC instead. This change is a bit closer to C++'s overload resolution rules, but it does have a bit of nuance around how dimension adjustment conversions are ranked. Conversion sequence ranks for HLSL are: * Exact match * Scalar Widening (i.e. splat) * Promotion * Scalar Widening with Promotion * Conversion * Scalar Widening with Conversion * Dimension Reduction (i.e. truncation) * Dimension Reduction with Promotion * Dimension Reduction with Conversion In this implementation I've folded the disambiguation into the conversion sequence ranks which does add some complexity as compared to C++, however this avoids needing to add special casing in `CompareStandardConversionSequences`. I believe the added conversion rank values provide a simpler approach, but feedback is appreciated. The HLSL language spec updates are in the PR here: microsoft/hlsl-specs#261

Rather than selecting the errno implementation based on the platform which doesn't provide the necessary flexibility, make it configurable. The errno value location is returned by `int *__llvm_libc_errno()` which is a common design used by other C libraries.

This addresses the build error introduced in #98287 where src/errno/errno.h is included instead of the system errno.h. We instead move the declaration to libc_errno.h.

The declaration must match the previous declaration in errno.h.

The definitions must match the previous declaration in errno.h.

Related to #92812

Add tests which are not safe to vectorize because %indices are loaded in the loop and the same indices could be loaded in later iterations. Tests for #87189.

StringRef::equals has been deprecated since: commit de483ad Author: Kazu Hirata <kazu@google.com> Date: Thu May 16 00:38:37 2024 -0700

This PR resolves #96322 and implements the `signbit` macro under a new header `generic-math-macros.h`. This also removed the `TODO` in `math-macros.h` and moves `isfinite`, `isinf`, and `isnan` to the same generic maths header. Finally, a test file `generic-math-macros_test.cpp` that adds coverage to the above 4 macros. Fixes #96322.

This is used in some embedded projects.

#98657) This patch further cleans up the implementation by removing some redundant checks and replacing cast<> with get() calls. This contribution is based on the discussion in #78735

Removing it from the codegen pipeline induces a lot of test churn because llc is no longer optimizing out implicit arguments to kernels. Mostly mechanical, but there are some creative test updates. I preferred to take the changes as-is in tests where the ABI isn't relevant. In cases where it's more relevant, or the optimize out logic was too ingrained in the test, I pre-run the optimization. Some cases manually add attributes to disable inputs.

We need to always do the CCEDiag, the report() is optional.

It can be a statement containing an expression.

We need to invert them and use the opposite shift.

This happens a lot for NonTypeTemplateParm decls.

Use -fno-rtti flag to avoid vtables in the runtime library (similarly to asan, dfsan, msan). Remove unneeded -fPIC from NSAN_CFLAGS. Fix #98767

Add `MachineOptimizationRemarkEmitterAnalysis` the legacy version `MachineOptimizationRemarkEmitterPass` is already a wrapper.

They were only called once, or not at all.

For invalid cases (non-vector/complex/...), this should only happen in error cases such as the attached test case.

areInlineCompatible checks to see if CalleeTLI.OverrideAsUnavailable is a subset of OverrideAsUnavailable by computing a union of the two and comparing the union and OverrideAsUnavailable. The problem is that computing a union involves memory allocations. This patch removes the need for memory allocations by switching to BitVector::test. Note that A.test(B) returns true if A - B is non-empty. That is, !A.test(B) is true if A if a subset of B. The use of BitVector::test here saves 0.20% of heap allocations during the compilation of X86ISelLowering.cpp.ii, a preprocessed version of X86ISelLowering.cpp.

) Added a RISCV overload of `isTruncateFree` to fix the break of vnsrl described in issue #94265. Fixes #94265

This adds a tablegen pattern to use ORRWrr (mov) as opposed to i64 AND 0xffffffff, as the mov will implicitly clear the upper bits. This can be seen as a zext(trunc(..)), and could be simpler if it is eliminated.

Add tests with accesses to the same pointer with different types. At the moment, runtime checks for those accesses are incorrectly based on the smaller type.

The optimization attributes are mostly noise for the purposes of the test. Also hoping this fixes https://lab.llvm.org/buildbot/#/builders/193/builds/940, which for some reason looks like the optimization isn't running.

The current frame might not be a constructor for the object we're initializing, but a parent frame might.

…98646) The plan is to add more TernaryOp in the future (SELECT/VSELECT and FMA in particular)

Use getNameForDiagnostic(), like the CallStackFrame of the current interpreter.

The same pointer may be accessed with different types and the bound includes the size of the accessed type to compute the end. Update the cache to correctly disambiguate between different accessed types.

…98809) These may not get canonicalized before conversion to spirv and need to be handled during vector to spirv conversion. Because spirv does not support 1-element vectors, we can't emit `spirv.VectorShuffle` and need to lower this to `spirv.CompositeExtract`.

Implement handling for new/delete/new[]/delete[] expressions via a new `DynamicAllocator` class. This introduces four new opcodes: - `Alloc` - Allocates one element (`new int(14)`) - `AllocN` - Allocates N elements of the given primitive (`new int[100]`) - `AllocCN` - Allocates N elements of the given (composite) descriptor (`new S[100]`) - `Free` - de-allocates memory allocates using any of the above.

The usual ambiguous APInt constructor: https://lab.llvm.org/buildbot/#/builders/141/builds/764

Add ifunc-after-resolver tests to inprove coverage and demonstrate the -fsanitize=kcfi issue reported at #96400.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] main from llvm:main #5542

[pull] main from llvm:main #5542

Commits on Jul 11, 2024

Commits on Jul 12, 2024

Commits on Jul 13, 2024

Commits on Jul 14, 2024

[pull] main from llvm:main #5542

Are you sure you want to change the base?

[pull] main from llvm:main #5542

Commits on Jul 11, 2024

Commits on Jul 12, 2024

Commits on Jul 13, 2024

Commits on Jul 14, 2024