Make conditional discard flags available externally #3

kuhar · 2022-01-26T22:39:13Z

This is so that we can set/access these flags in other .cpp files.

This pass attempts to merge s_buffer_load_dword instructions into larger sizes. Constraints are that the resource descriptor, glc flag and size are all the same for contiguous load instructions. TODO: it should be possible to add different instructions to this as well such as s_buffer_store and the vector variants buffer_load/buffer_store V2: Fixes to sbuff-merge test V3: Fixed smrd test for smem merging V4: Fixed for gfx10 changes Change-Id: Ie98107ad3b27b3c8c2da7afc3d129336f924895a

Summary: Added intrinsics for the instructions: - buffer_load_ubyte - buffer_load_ushort - buffer_store_byte - buffer_store_short Added test cases to existing buffer load/store tests. Now that upstream supports byte/short overloads of the normal load/store intrinsics, this change is only needed until LLPC stops using the intrinsics added here. Change-Id: I1b8a0910c508c9520a84b74a14d0aea8293cef38

Vulkan exposed an issue with this for a case with v_mad_mixlo_f16 where the upper 16 bits were not cleared. Modifying this to clear the bits instead of just copying fixed the problem. V2: Fixed up "Fix issue for zext of f16 to i32" V3: Fixed fcanonicalize-elimination test Change-Id: I6128deaf8ebc3489fdd3e0caead837410bb47160

Readlane should have a uniform index - but in some cases it can be non-uniform. The original implementation of readlane defaulted to using a readfirstlane of the vgpr index on the assumption that the index would be uniform across all lanes. However, there are some cases where we might want a readlane like operation that is lowered to a waterfall that performs a readlane for each distinct index value across lanes. Essentially we form a loop that uses readfirstlane to get the first enabled lane's index - enable all lanes with the same index, perform a readlane and put the result into the vgpr result register, then disable those lanes and repeat. The pathological case will form a loop of 64 - but in most cases it will be fewer. In the case where the index IS uniform across all lanes, it will loop once, which is slightly more expensive than the original code, but not a lot more. V2: Initialize accumulator register in waterfall code V3: Fixed for gfx10 changes Change-Id: Ic52dfb14b053db7713b61c85e8ac766991aecdb8

Even though writelane doesn't have the same constraints as other valu instructions it still can't violate the >1 SGPR operand constraint Due to later register propagation (e.g. fixing up vgpr operands via readfirstlane) changing writelane to only have a single SGPR is tricky. This implementation puts a new check after SIFixSGPRCopies that prevents multiple SGPRs being used in any writelane instructions. The algorithm used is to check for trivial copy prop of constants into one of the SGPR operands and perform that if possible. If this isn't possible put an explicit copy of Src1 SGPR into M0 and use that instead (this is allowable for writelane as the constraint is for SGPR read-port and not constant-bus access). V2: Fix-up cases where writelane has 2 SGPR operands - bug fixes Update to previous commit to fix an issue where immediates are tested for copy propagation but aren't suitable as inline constants. The old method caused issues that resulted in seg faults in later cse phases. Even though writelane doesn't have the same constraints as other valu instructions it still can't violate the >1 SGPR operand constraint Due to later register propagation (e.g. fixing up vgpr operands via readfirstlane) changing writelane to only have a single SGPR is tricky. This implementation puts a new check after SIFixSGPRCopies that prevents multiple SGPRs being used in any writelane instructions. The algorithm used is to check for trivial copy prop of constants into one of the SGPR operands and perform that if possible. If this isn't possible put an explicit copy of Src1 SGPR into M0 and use that instead (this is allowable for writelane as the constraint is for SGPR read-port and not constant-bus access). Change-Id: Ic3045df22db738ca6572af00138520fd28563990

Implements 3 new intrinsics for implementing waterfall code for regions of code. Waterfall is implemented as a loop that iterates over an index of values for active lanes. For each iteration an index is picked (the first active lane) and all lanes with the same index are left active (the rest are disabled). The body of the code is then executed and the result accumulated into a result vector register. The active lane is then disabled and the next index chosen for the next iteration. The worst case for waterfall is one iteration per lane - but it is usually far less than this. The implementation uses 3 intrinsics to mark a region: - llvm.amdgcn.waterfall.begin - llvm.amdgcn.waterfall.readfirstlane - llvm.amdgcn.waterfall.end The group must contain one begin and at least one readfirstlane and end intrinsic - if the readfirstlane uses the begin index, then there will only be a single readfirstlane and not 2. The readfirstlane and end intrinsics are not limited to single dword values. The waterfall loop will enclose all instructions from the begin to the final end intrinsic. See the test case for specific examples of use. V2: [AMDGPU] Updates to waterfall intrinsic support Some fixes for waterfall support. 1. Intrinsics are better defined so enforce type correctness 2. Creation of the waterfall loop requires removing kill flag for operands moved into the loop, this can otherwise cause an assert during verification 3. Fixed issue causing a tablegen failure due to float return type being used for a scalar return value (incorrectly). The matching for intrinsic to pseudo instruction now uses 2 template parameter types to disambiguate when using float src types. 4. Ensure that any analyses are invalidated due to insertion of new basic blocks. This should have been the default anyway. 5. Updated test in light of these changes. These are good exemplars for implementors to check against. V3: [AMDGCN] Fixed waterfall intrinsic groups for multiple groups per BB For cases with more than one waterfall group per basic block, the implementation would go wrong after creating the first waterfall loop (incorrectly tracking current BB). Also added a test case. V4: [AMDGCN] Waterfall enhancements Extra waterfall test for multiple readfirstlane intrinsics. Make sure that the waterfall intrinsic clauses support multiple readfirstlane intrinsics. Extended support for begin to allow multi-dword indices for the waterfall loop. This allows the use of multiple indexes (and multiple non-uniform values) in the same waterfall loop by combining the individual indices into a single multi-dword index. This has the same worst-case of 64 iterations (wave size) as before. Added a new test case to demonstrate using multi-dword indices. Added support for i16/f16 Changed the implementation to prevent CSE from merging clauses (by tagging the intrinsics as having side effects and enhancing uniform detection to work in this case as well). Left in some support added to partially support EarlyCSE merging as it is still valid, but probably won't be triggered much now merging is disabled. Added support for a last.use intrinsic. See the test for an example of this. This works in the same way as the end intrinsic, but you tag a last_use instead (so it includes the next use in the waterfall loop). More than one use is acceptable as long as it is in the same BB. All instructions up to the last last.use will be included in the waterfall loop. An important rule for waterfall clauses (that isn't detected if violated) is that you can't have something inside the loop that is dependent on an end intrinsic in the loop - this restriction is logically consistent - you have to use two loops instead. V5: [AMDGPU] Handle unexpected uniform inputs in waterfall intrinsics Waterfall intrinsic input operands can sometimes be uniform. The waterfall handling pass should deal with this gracefully, even though this could be sub-optimal. This is an initial fix for a problem observed in a game. Subsequent better fix being worked on which should produce optimal code (by removing redundant waterfall loops entirely if all the elements turn out to be uniform anyway). V6: [AMDGPU] Improved handling of uniform waterfalls This builds on a previous fix for uniform values being specified in waterfall loop intrinsic groups. Previous fix would just remove any waterfall.readfirstlane intrinsics if the input was uniform. This fix improves on this by spotting when ALL waterfall.readfirstlanes are uniform and removing the waterfall loop entirely. The tests have been extended to test this mode. The update also handles the case where not all waterfall.readfirstlanes are removed. This is also covered in the associated tests. V7: [AMDGCN] Re-order waterfall and wholequadmode passes Running WholeQuadMode pass after waterfall can have bad consequences. Changing the order improves the outcome - but there may still be problems in some unusual cases. For the known uses of the waterfall intrinsics this change will work for now. V8: waterfall test fixup Change-Id: I95b4391b8c0570bd399d70ed64af355d13ef4c84

Summary: The test shows a case where a full use is not in a subrange because the subreg is undefined at the use point, but, after a coalesce, the subrange incorrectly still does not include the use even though the subreg is now defined at that point. The incorrect live range causes a "subrange join unreachable" assert on a later coalesce. This commit ensures that a subrange is extended to all uses that can be reached from a definition. V2: Completely different fix, instead of working round the problem in the later coalesce that actually asserted. V5: Fixed to not extend liveness to an undef use. Spun second test out into its own D51257 as it is a completely different problem with the same "subrange join unreachable" symptom. V6: Ignore debug uses. V7: Set up undefs correctly, to avoid generating invalid subranges. Differential Revision: https://reviews.llvm.org/D49097 Change-Id: I172cfe16d360690e921ebe606d3e90dd1cdd1b71

eliminateUndefCopy was incorrectly eliminating a copy that was only a partial write of the target. In the test case, the resulting incorrect live range caused a subrange join unreachable later. The test case does not fail for me until I have D49097. Differential Revision: https://reviews.llvm.org/D51257 Change-Id: Ibb767d8a016934431660764b0194634e2113fea2

Summary: findReachingDefs was incorrectly using its fast path of just blitting in the live range extension when it did not see an undef itself but encountered an undef live out of a predecessor block. Now RegisterCoalescing calls extendToIndices (D49097), the bug was causing an incorrect subrange which led to a "Couldn't join subrange!" assert in a later coalesce. Subscribers: MatzeB, jvesely, nhaehnle, llvm-commits Differential Revision: https://reviews.llvm.org/D51574 Change-Id: I01c942b41ab5a145348289c8b221c45a2ec7200f

…vePartialRedundancy Summary: removePartialRedundancy was using extendToIndices on each subrange without passing in an Undefs vector containing main reg defs that are undef in the subrange, causing the above assert. Unfortunately I can only reproduce this in an ll test. Turning it into a mir test makes the problem go away. Subscribers: MatzeB, qcolombet, jvesely, nhaehnle, llvm-commits Differential Revision: https://reviews.llvm.org/D51849 Change-Id: I4f3ba2afb20d79f9467afb3d882f0328d38531be

…wCopyChain valuesIdentical is called to determine whether a def can be considered an "identical erase" because following copy chains proves that the value being copied is the same value as the value of the other register. The tests show up problems where the main range decides that a def is an "identical erase" but a subrange disagrees, because following the copy chain leads to a def that does not define all subregs in the subrange (in the case of one test) or a different def to that found in the main range (in the case of the other test). The fix here is to only detect an "identical erase" in the main range. If it is found, it is automatically inherited by the corresponding def in any subrange, without further analysis of that subrange value. This fix supersedes D49535/rL338070 and D51848. Change-Id: I059b5b2273ed6f186d134b98c118fef0494e02f8

This is a temporary fix to avoid a breakage where the export(s) for a pixel shader are in control flow (due to a kill). A pixel shader must execute an export done, even with exec=0. A better fix would be to have a pixel-shader-only pass that inserts a null export done in the final basic block if there is not one already there. Change-Id: I22f201f95d52ff699aa8cac26bb019717b20432e

LLPC has .ll file library functions that break this verifier check. We need to fix those before re-enabling the check. Change-Id: I4c000793a805a50f1d91d1ae55c567a356e72519

This commit fixes a really fun bug with inst combine where you have a vector-of-pointers and only one element of this vector is used. InstCombine correctly notes that an element is not used, and then will go back through a gep into that vector-of-pointers and set all vector elements to undef. This is fine in all cases except that the langref (and a ton of places in the optimizer) requires that indexing into a struct has the same index for each element of the vector-of-pointers. To fix the bug I've just made the gep/undef propagation check that the current type being indexed into is not a struct when doing the undef propagation. Differential Revision: https://reviews.llvm.org/D60600 Change-Id: I6eba34b1cde9c14751c39f04627e39425639ab3b

…d non-divergent Summary: This fixes an issue where values are uniform inside of a loop but the uses of that value outside the loop can be divergent. This is a temporary fix until the library linker issues can be resolved in llvm. Change-Id: I94d3d2e30cc2a6ae8d59e92cadf6f1b6cb7e708b

Implement a pass, enabled by -amdgpu-scratch-bounds-checking, or the subtarget feature "enable-scratch-bounds-checks", which adds bounds checks to scratch accesses. When this pass is enabled, out-of-bounds writes have no effect and out-of-bounds reads return zero. This is useful for GFX9 where hardware no longer performs bounds checking on scratch accesses and hence page faults result for out-of-bounds accesses by generated by shaders. Change-Id: Id2ee4b1f32e70b6bde2541db755727b6a407721b

stripValuesNotDefiningMask was asserting that it did not leave an empty subrange. However that was bogus because the subrange could have been empty to start with, in the case that LiveRangeCalc::calculate saw a subreg use that caused a subrange to be created empty. Differential Revision: https://reviews.llvm.org/D63510 Change-Id: Ibf862415ea422198975cc7a2ca2d98531beec08d

…REL_OFFSET is 0" This reverts commit 58b3837. That commit was causing an unnecessary reloc when accessing a global in the same section. On PAL, that breaks our use of a read-only global variable as an optimization of a local variable with constant initialization, because the PAL loader ignores the reloc. Change-Id: I2792404227d08d98baee69504509828de7e9b36c

Implement appropriate register allocation and exec mask usage for wave32 scenarios. Change-Id: I844018f6c07fdda46366af51654017fb4b654d8a

Extended test to check wave32 and gfx10 Change-Id: I620c6d9e737ddccdfd3494abf72632d7ecc4a53f

Change-Id: I3557477344135f7dbece33a2945bcb9f4063c10f

Extend LoadStoreOptimizer to handle IMAGE_LOAD and IMAGE_SAMPLE. Change-Id: Id49c992b9781254e39e1352125f5ffb212fe4f24

Summary: Backend defaults to setting WGP mode to 0 or 1 depending on the cumode feature. For PAL clients we want PAL to set this mode. Adding check to prevent the backend overriding whatever the front end has set. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63639 Change-Id: Ia3bac3e562fb47089f05c11cae47d1c92f7195ee

Fix a breakage in areMemAccessesTriviallyDisjoint, caused by trying to merge a non-load MIMG instruction IMAGE_GET_RESINFO. Only MIMG that loads from memory should be considered. Change-Id: I012435534b3e6161ee7feb686d4711097d2ab980

Currently the atomic optimizer does not support wave32, and may generate DPP which is incompatible with GFX10. This change allows the atomic optimizer to be enabled unconditionally by the LLPC for all ASICs. Change-Id: Ia26b527675b6e07319f4299b3f82ad405916bad6

Summary: This fixes B42473 and B42706. This patch makes the SDA propagate branch divergence until the end of the RPO traversal. Before, the SyncDependenceAnalysis propagated divergence only until the IPD in rpo order. RPO is incompatible with post dominance in the presence of loops. This made the SDA crash because blocks were missed in the propagation. Reviewers: foad, nhaehnle Subscribers: jvesely, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65274 Change-Id: Ic3fcc51e5850fde3a4649ec3dd84a040a07aaca9

Summary: Add support for gfx10, where all DPP operations are confined to work within a single row of 16 lanes, and wave32. Reviewers: arsenm, sheredom, critson, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, t-tye, hiraditya, jfb, dstuttard, tpr, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65644 Change-Id: I1f8991e9cf732e79a139e79e70a2bf62650693d4

…DD_REL_OFFSET is 0" This reverts commit ad3ea6fe0eedc9fe48b96453cfed51f8c1dd79de.

…ord in SI_PC_ADD_REL_OFFSET is 0" (From Jay) D61491 caused us to use relocs when they're not strictly necessary, to refer to symbols in the text section. This is a pessimization and it's a problem for some loaders that don't support relocs yet. Differential Revision: https://reviews.llvm.org/D65813 Change-Id: Ia84de963ad099b7d2fcfb9cd2b908fa2e5a4788b

Change-Id: I3dddc18d5dea0fd1050633e35d0ac463a9ed26ca

…9632404 Based on upstream llvm : 4edb998 [SelectionDAG] treat X constrained labels as i for asm Added AMD modification notices and removed some GPL files. Change-Id: I46d09041fccf532e2c31320842d3f932e0d088a2

This is so that we can set/access these flags in other `.cpp` files.

jayfoad · 2022-01-27T09:06:19Z

llvm/lib/Target/AMDGPU/AMDGPUConditionalDiscard.cpp

 // Enable conditional discard transformations
-static cl::opt<bool> EnableConditionalDiscardTransformations(
+opt<bool> EnableConditionalDiscardTransformations(


No particular objection, but I haven't seen this pattern before of putting the option itself (EnableConditionalDiscardTransformations) in the cl namespace. Is it used elsewhere in LLVM?

It does seem to be used occasionally elsewhere in LLVM.
Exposing this option seems fine to me.

I've seen this pattern used in LLPC, e.g.,

https://github.com/GPUOpen-Drivers/llpc/blob/dev/llpc/context/llpcCompiler.cpp#L105

https://github.com/GPUOpen-Drivers/llpc/blob/dev/llpc/util/llpcDebug.cpp#L47

https://github.com/GPUOpen-Drivers/llpc/blob/dev/llpc/util/llpcTimerProfiler.cpp#L49

We could put it in namespace llvm instead of llvm::cl. Both work for me.

We experienced some deadlocks when we used multiple threads for logging using `scan-builds` intercept-build tool when we used multiple threads by e.g. logging `make -j16` ``` (gdb) bt #0 0x00007f2bb3aff110 in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007f2bb3af70a3 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0 #2 0x00007f2bb3d152e4 in ?? () #3 0x00007ffcc5f0cc80 in ?? () llvm#4 0x00007f2bb3d2bf5b in ?? () from /lib64/ld-linux-x86-64.so.2 llvm#5 0x00007f2bb3b5da27 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 llvm#6 0x00007f2bb3b5dbe0 in exit () from /lib/x86_64-linux-gnu/libc.so.6 llvm#7 0x00007f2bb3d144ee in ?? () llvm#8 0x746e692f706d742f in ?? () llvm#9 0x692d747065637265 in ?? () llvm#10 0x2f653631326b3034 in ?? () llvm#11 0x646d632e35353532 in ?? () llvm#12 0x0000000000000000 in ?? () ``` I think the gcc's exit call caused the injected `libear.so` to be unloaded by the `ld`, which in turn called the `void on_unload() __attribute__((destructor))`. That tried to acquire an already locked mutex which was left locked in the `bear_report_call()` call, that probably encountered some error and returned early when it forgot to unlock the mutex. All of these are speculation since from the backtrace I could not verify if frames 2 and 3 are in fact corresponding to the `libear.so` module. But I think it's a fairly safe bet. So, hereby I'm releasing the held mutex on *all paths*, even if some failure happens. PS: I would use lock_guards, but it's C. Reviewed-by: NoQ Differential Revision: https://reviews.llvm.org/D118439

This patch fixes a data race in IOHandlerProcessSTDIO. The race is happens between the main thread and the event handling thread. The main thread is running the IOHandler (IOHandlerProcessSTDIO::Run()) when an event comes in that makes us pop the process IO handler which involves cancelling the IOHandler (IOHandlerProcessSTDIO::Cancel). The latter calls SetIsDone(true) which modifies m_is_done. At the same time, we have the main thread reading the variable through GetIsDone(). This patch avoids the race by using a mutex to synchronize the two threads. On the event thread, in IOHandlerProcessSTDIO ::Cancel method, we obtain the lock before changing the value of m_is_done. On the main thread, in IOHandlerProcessSTDIO::Run(), we obtain the lock before reading the value of m_is_done. Additionally, we delay calling SetIsDone until after the loop exists, to avoid a potential race between the two writes. Write of size 1 at 0x00010b66bb68 by thread T7 (mutexes: write M2862, write M718324145051843688): #0 lldb_private::IOHandler::SetIsDone(bool) IOHandler.h:90 (liblldb.15.0.0git.dylib:arm64+0x971d84) #1 IOHandlerProcessSTDIO::Cancel() Process.cpp:4382 (liblldb.15.0.0git.dylib:arm64+0x5ddfec) #2 lldb_private::Debugger::PopIOHandler(std::__1::shared_ptr<lldb_private::IOHandler> const&) Debugger.cpp:1156 (liblldb.15.0.0git.dylib:arm64+0x3cb2a8) #3 lldb_private::Debugger::RemoveIOHandler(std::__1::shared_ptr<lldb_private::IOHandler> const&) Debugger.cpp:1063 (liblldb.15.0.0git.dylib:arm64+0x3cbd2c) llvm#4 lldb_private::Process::PopProcessIOHandler() Process.cpp:4487 (liblldb.15.0.0git.dylib:arm64+0x5c583c) llvm#5 lldb_private::Debugger::HandleProcessEvent(std::__1::shared_ptr<lldb_private::Event> const&) Debugger.cpp:1549 (liblldb.15.0.0git.dylib:arm64+0x3ceabc) llvm#6 lldb_private::Debugger::DefaultEventHandler() Debugger.cpp:1622 (liblldb.15.0.0git.dylib:arm64+0x3cf2c0) llvm#7 std::__1::__function::__func<lldb_private::Debugger::StartEventHandlerThread()::$_2, std::__1::allocator<lldb_private::Debugger::StartEventHandlerThread()::$_2>, void* ()>::operator()() function.h:352 (liblldb.15.0.0git.dylib:arm64+0x3d1bd8) llvm#8 lldb_private::HostNativeThreadBase::ThreadCreateTrampoline(void*) HostNativeThreadBase.cpp:62 (liblldb.15.0.0git.dylib:arm64+0x4c71ac) llvm#9 lldb_private::HostThreadMacOSX::ThreadCreateTrampoline(void*) HostThreadMacOSX.mm:18 (liblldb.15.0.0git.dylib:arm64+0x29ef544) Previous read of size 1 at 0x00010b66bb68 by main thread: #0 lldb_private::IOHandler::GetIsDone() IOHandler.h:92 (liblldb.15.0.0git.dylib:arm64+0x971db8) #1 IOHandlerProcessSTDIO::Run() Process.cpp:4339 (liblldb.15.0.0git.dylib:arm64+0x5ddc7c) #2 lldb_private::Debugger::RunIOHandlers() Debugger.cpp:982 (liblldb.15.0.0git.dylib:arm64+0x3cb48c) #3 lldb_private::CommandInterpreter::RunCommandInterpreter(lldb_private::CommandInterpreterRunOptions&) CommandInterpreter.cpp:3298 (liblldb.15.0.0git.dylib:arm64+0x506478) llvm#4 lldb::SBDebugger::RunCommandInterpreter(bool, bool) SBDebugger.cpp:1166 (liblldb.15.0.0git.dylib:arm64+0x53604) llvm#5 Driver::MainLoop() Driver.cpp:634 (lldb:arm64+0x100006294) llvm#6 main Driver.cpp:853 (lldb:arm64+0x100007344) Differential revision: https://reviews.llvm.org/D120762

…ified offset and its parents or children with spcified depth." This reverts commit a3b7cb0. symbol-offset.test fails under MSAN: [ 1] ; RUN: llvm-pdbutil yaml2pdb %p/Inputs/symbol-offset.yaml --pdb=%t.pdb [FAIL] llvm-pdbutil yaml2pdb <REDACTED>/llvm/test/tools/llvm-pdbutil/Inputs/symbol-offset.yaml --pdb=<REDACTED>/tmp/symbol-offset.test/symbol-offset.test.tmp.pdb ==9283==WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x55f975e5eb91 in __libcpp_tls_set <REDACTED>/include/c++/v1/__threading_support:428:12 #1 0x55f975e5eb91 in set_pointer <REDACTED>/include/c++/v1/thread:196:5 #2 0x55f975e5eb91 in void* std::__msan::__thread_proxy<std::__msan::tuple<std::__msan::unique_ptr<std::__msan::__thread_struct, std::__msan::default_delete<std::__msan::__thread_struct> >, llvm::parallel::detail::(anonymous namespace)::ThreadPoolExecutor::ThreadPoolExecutor(llvm::ThreadPoolStrategy)::'lambda'()::operator()() const::'lambda'()> >(void*) <REDACTED>/include/c++/v1/thread:285:27 #3 0x7f74a1e55b54 in start_thread (<REDACTED>/libpthread.so.0+0xbb54) (BuildId: 64752de50ebd1a108f4b3f8d0d7e1a13) llvm#4 0x7f74a1dc9f7e in clone (<REDACTED>/libc.so.6+0x13cf7e) (BuildId: 7cfed7708e5ab7fcb286b373de21ee76)

The asm parser had a notional distinction between parsing an operand (like "%foo" or "%4#3") and parsing a region argument (which isn't supposed to allow a result number like #3). Unfortunately the implementation has two problems: 1) It didn't actually check for the result number and reject it. parseRegionArgument and parseOperand were identical. 2) It had a lot of machinery built up around it that paralleled operand parsing. This also was functionally identical, but also had some subtle differences (e.g. the parseOptional stuff had a different result type). I thought about just removing all of this, but decided that the missing error checking was important, so I reimplemented it with a `allowResultNumber` flag on parseOperand. This keeps the codepaths unified and adds the missing error checks. Differential Revision: https://reviews.llvm.org/D124470

@rickyz

The original fix (commit 23ec578) of llvm#52787 only adds `Function`s that have `Instruction`s that directly use `BlockAddress`es into the bitcode (`FUNC_CODE_BLOCKADDR_USERS`). However, in either @rickyz's original reproducing code: ``` void f(long); __attribute__((noinline)) static void fun(long x) { f(x + 1); } void repro(void) { fun(({ label: (long)&&label; })); } ``` ``` ... define dso_local void @repro() #0 { entry: br label %label label: ; preds = %entry tail call fastcc void @fun() ret void } define internal fastcc void @fun() unnamed_addr #1 { entry: tail call void @f(i64 add (i64 ptrtoint (i8* blockaddress(@repro, %label) to i64), i64 1)) #3 ret void } ... ``` or the xfs and overlayfs in the Linux kernel, `BlockAddress`es (e.g., `i8* blockaddress(@repro, %label)`) may first compose `ConstantExpr`s (e.g., `i64 ptrtoint (i8* blockaddress(@repro, %label) to i64)`) and then used by `Instruction`s. This case is not handled by the original fix. This patch adds *indirect* users of `BlockAddress`es, i.e., the `Instruction`s using some `Constant`s which further use the `BlockAddress`es, into the bitcode as well, by doing depth-first searches. Fixes: llvm#52787 Fixes: 23ec578 ("[Bitcode] materialize Functions early when BlockAddress taken") Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D124878

This reverts commit c274b6e. The x86_64 debian bot got a failure with this patch, https://lab.llvm.org/buildbot#builders/68/builds/33078 where SymbolFile/DWARF/x86/DW_TAG_variable-DW_AT_decl_file-DW_AT_abstract_origin-crosscu1.s is crashing here - #2 0x0000000000425a9f SignalHandler(int) Signals.cpp:0:0 #3 0x00007f57160e9140 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x14140) llvm#4 0x00007f570d911e43 lldb_private::SourceManager::GetFile(lldb_private::FileSpec const&) crtstuff.c:0:0 llvm#5 0x00007f570d914270 lldb_private::SourceManager::DisplaySourceLinesWithLineNumbers(lldb_private::FileSpec const&, unsigned int, unsigned int, unsigned int, unsigned int, char const*, lldb_private::Stream*, lldb_private::SymbolContextList const*) crtstuff.c:0:0 llvm#6 0x00007f570da662c8 lldb_private::StackFrame::GetStatus(lldb_private::Stream&, bool, bool, bool, char const*) crtstuff.c:0:0 I don't get a failure here my mac, I'll review this method more closely tomorrow.

…ned form The DWARF spec says: Any debugging information entry representing the declaration of an object, module, subprogram or type may have DW_AT_decl_file, DW_AT_decl_line and DW_AT_decl_column attributes, each of whose value is an unsigned integer ^^^^^^^^ constant. If however, a producer happens to emit DW_AT_decl_file / DW_AT_decl_line using a signed integer form, llvm-dwarfdump crashes, like so: (... snip ...) 0x000000b4: DW_TAG_structure_type DW_AT_name ("test_struct") DW_AT_byte_size (136) DW_AT_decl_file (llvm-dwarfdump: (... snip ...)/llvm/include/llvm/ADT/Optional.h:197: T& llvm::optional_detail::OptionalStorage<T, true>::getValue() & [with T = long unsigned int]: Assertion `hasVal' failed. PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace. Stack dump: 0. Program arguments: /opt/rocm/llvm/bin/llvm-dwarfdump ./testsuite/outputs/gdb.rocm/lane-pc-vega20/lane-pc-vega20-kernel.so #0 0x000055cc8e78315f PrintStackTraceSignalHandler(void*) Signals.cpp:0:0 #1 0x000055cc8e780d3d SignalHandler(int) Signals.cpp:0:0 #2 0x00007f8f2cae8420 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x14420) #3 0x00007f8f2c58d00b raise /build/glibc-SzIz7B/glibc-2.31/signal/../sysdeps/unix/sysv/linux/raise.c:51:1 llvm#4 0x00007f8f2c56c859 abort /build/glibc-SzIz7B/glibc-2.31/stdlib/abort.c:81:7 llvm#5 0x00007f8f2c56c729 get_sysdep_segment_value /build/glibc-SzIz7B/glibc-2.31/intl/loadmsgcat.c:509:8 llvm#6 0x00007f8f2c56c729 _nl_load_domain /build/glibc-SzIz7B/glibc-2.31/intl/loadmsgcat.c:970:34 llvm#7 0x00007f8f2c57dfd6 (/lib/x86_64-linux-gnu/libc.so.6+0x33fd6) llvm#8 0x000055cc8e58ceb9 llvm::DWARFDie::dump(llvm::raw_ostream&, unsigned int, llvm::DIDumpOptions) const (/opt/rocm/llvm/bin/llvm-dwarfdump+0x2e0eb9) llvm#9 0x000055cc8e58bec3 llvm::DWARFDie::dump(llvm::raw_ostream&, unsigned int, llvm::DIDumpOptions) const (/opt/rocm/llvm/bin/llvm-dwarfdump+0x2dfec3) llvm#10 0x000055cc8e5b28a3 llvm::DWARFCompileUnit::dump(llvm::raw_ostream&, llvm::DIDumpOptions) (.part.21) DWARFCompileUnit.cpp:0:0 Likewise with DW_AT_call_file / DW_AT_call_line. The problem is that the code in llvm/lib/DebugInfo/DWARF/DWARFDie.cpp dumping these attributes assumes that FormValue.getAsUnsignedConstant() returns an armed optional. If in debug mode, we get an assertion line the above. If in release mode, and asserts are compiled out, then we proceed as if the optional had a value, running into undefined behavior, printing whatever random value. Fix this by checking whether the optional returned by FormValue.getAsUnsignedConstant() has a value, like done in other places. In addition, DWARFVerifier.cpp is validating DW_AT_call_file / DW_AT_decl_file, but not AT_call_line / DW_AT_decl_line. This commit fixes that too. The llvm-dwarfdump/X86/verify_file_encoding.yaml testcase is extended to cover these cases. Current llvm-dwarfdump crashes running the newly-extended test. "make check-llvm-tools-llvm-dwarfdump" shows no regressions, on x86-64 GNU/Linux. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D129392

Another attempt to skip the fast-math linker test on powerpc. The test has to be skipped because there is no crtfastmath.o on powerpc. Change recommended by Amy Kwan <amyk>. See https://reviews.llvm.org/D138675

For example, if you have a chain of inlined funtions like this: 1 #include <stdlib.h> 2 int g1 = 4, g2 = 6; 3 4 static inline void bar(int q) { 5 if (q > 5) 6 abort(); 7 } 8 9 static inline void foo(int q) { 10 bar(q); 11 } 12 13 int main() { 14 foo(g1); 15 foo(g2); 16 return 0; 17 } with optimizations you could end up with a single abort call for the two inlined instances of foo(). When merging the locations for those inlined instances you would previously end up with a 0:0 location in main(). Leaving out that inlined chain from the location for the abort call could make troubleshooting difficult in some cases. This patch changes DILocation::getMergedLocation() to try to handle such cases. The function is rewritten to first find a common starting point for the two locations (same subprogram and inlined-at location), and then in reverse traverses the inlined-at chain looking for matches in each subprogram. For each subprogram, the merge function will find the nearest common scope for the two locations, and matching line and column (or set them to 0 if not matching). In the example above, you will for the abort call get a location in bar() at 6:5, inlined in foo() at 10:3, inlined in main() at 0:0 (since the two inlined functions are on different lines, but in the same scope). I have not seen anything in the DWARF standard that would disallow inlining a non-zero location at 0:0 in the inlined-at function, and both LLDB and GDB seem to accept these locations (with D142552 needed for LLDB to handle cases where the file, line and column number are all 0). One incompatibility with GDB is that it seems to ignore 0-line locations in some cases, but I am not aware of any specific issue that this patch produces related to that. With x86-64 LLDB (trunk) you previously got: frame #0: 0x00007ffff7a44930 libc.so.6`abort frame #1: 0x00005555555546ec a.out`main at merge.c:0 and will now get: frame #0: 0x[...] libc.so.6`abort frame #1: 0x[...] a.out`main [inlined] bar(q=<unavailable>) at merge.c:6:5 frame #2: 0x[...] a.out`main [inlined] foo(q=<unavailable>) at merge.c:10:3 frame #3: 0x[...] a.out`main at merge.c:0 and with x86-64 GDB (11.1) you will get: (gdb) bt #0 0x00007ffff7a44930 in abort () from /lib64/libc.so.6 #1 0x00005555555546ec in bar (q=<optimized out>) at merge.c:6 #2 foo (q=<optimized out>) at merge.c:10 #3 0x00005555555546ec in main () Reviewed By: aprantl, dblaikie Differential Revision: https://reviews.llvm.org/D142556

This change prevents rare deadlocks observed for specific macOS/iOS GUI applications which issue many `dlopen()` calls from multiple different threads at startup and where TSan finds and reports a race during startup. Providing a reliable test for this has been deemed infeasible. Although I've only observed this deadlock on Apple platforms, conceptually the cause is not confined to Apple code so the fix lives in platform-independent code. Deadlock scenario: ``` Thread 2 | Thread 4 ReportRace() | Lock internal TSan mutexes | &ctx->slot_mtx | | dlopen() interceptor | OnLibraryLoaded() | MemoryMappingLayout::DumpListOfModules() | calls dyld API, which takes internal lock | lock() interceptor | TSan tries to take internal mutexes again | &ctx->slot_mtx call into symbolizer | MemoryMappingLayout::DumpListOfModules() calls dyld API, which hangs on trying to take lock ``` Resulting in: * Thread 2 has internal TSan mutex, blocked on dyld lock * Thread 4 has dyld lock, blocked on internal TSan mutex The fix prevents this situation by not intercepting any of the calls originating from `MemoryMappingLayout::DumpListOfModules()`. Stack traces for deadlock between ReportRace() and dlopen() interceptor: ``` thread #2, queue = 'com.apple.root.default-qos' frame #0: libsystem_kernel.dylib frame #1: libclang_rt.tsan_osx_dynamic.dylib`::wrap_os_unfair_lock_lock_with_options(lock=<unavailable>, options=<unavailable>) at tsan_interceptors_mac.cpp:306:3 frame #2: dyld`dyld4::RuntimeLocks::withLoadersReadLock(this=0x000000016f21b1e0, work=0x00000001814523c0) block_pointer) at DyldRuntimeState.cpp:227:28 frame #3: dyld`dyld4::APIs::_dyld_get_image_header(this=0x0000000101012a20, imageIndex=614) at DyldAPIs.cpp:240:11 frame llvm#4: libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::MemoryMappingLayout::CurrentImageHeader(this=<unavailable>) at sanitizer_procmaps_mac.cpp:391:35 frame llvm#5: libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::MemoryMappingLayout::Next(this=0x000000016f2a2800, segment=0x000000016f2a2738) at sanitizer_procmaps_mac.cpp:397:51 frame llvm#6: libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::MemoryMappingLayout::DumpListOfModules(this=0x000000016f2a2800, modules=0x00000001011000a0) at sanitizer_procmaps_mac.cpp:460:10 frame llvm#7: libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::ListOfModules::init(this=0x00000001011000a0) at sanitizer_mac.cpp:610:18 frame llvm#8: libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::Symbolizer::FindModuleForAddress(unsigned long) [inlined] __sanitizer::Symbolizer::RefreshModules(this=0x0000000101100078) at sanitizer_symbolizer_libcdep.cpp:185:12 frame llvm#9: libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::Symbolizer::FindModuleForAddress(this=0x0000000101100078, address=6465454512) at sanitizer_symbolizer_libcdep.cpp:204:5 frame llvm#10: libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::Symbolizer::SymbolizePC(this=0x0000000101100078, addr=6465454512) at sanitizer_symbolizer_libcdep.cpp:88:15 frame llvm#11: libclang_rt.tsan_osx_dynamic.dylib`__tsan::SymbolizeCode(addr=6465454512) at tsan_symbolize.cpp:106:35 frame llvm#12: libclang_rt.tsan_osx_dynamic.dylib`__tsan::SymbolizeStack(trace=StackTrace @ 0x0000600002d66d00) at tsan_rtl_report.cpp:112:28 frame llvm#13: libclang_rt.tsan_osx_dynamic.dylib`__tsan::ScopedReportBase::AddMemoryAccess(this=0x000000016f2a2a90, addr=4381057136, external_tag=<unavailable>, s=<unavailable>, tid=<unavailable>, stack=<unavailable>, mset=0x00000001012fc310) at tsan_rtl_report.cpp:190:16 frame llvm#14: libclang_rt.tsan_osx_dynamic.dylib`__tsan::ReportRace(thr=0x00000001012fc000, shadow_mem=0x000008020a4340e0, cur=<unavailable>, old=<unavailable>, typ0=1) at tsan_rtl_report.cpp:795:9 frame llvm#15: libclang_rt.tsan_osx_dynamic.dylib`__tsan::DoReportRace(thr=0x00000001012fc000, shadow_mem=0x000008020a4340e0, cur=Shadow @ x22, old=Shadow @ 0x0000600002d6b4f0, typ=1) at tsan_rtl_access.cpp:166:3 frame llvm#16: libclang_rt.tsan_osx_dynamic.dylib`::__tsan_read8(void *) at tsan_rtl_access.cpp:220:5 frame llvm#17: libclang_rt.tsan_osx_dynamic.dylib`::__tsan_read8(void *) [inlined] __tsan::MemoryAccess(thr=0x00000001012fc000, pc=<unavailable>, addr=<unavailable>, size=8, typ=1) at tsan_rtl_access.cpp:442:3 frame llvm#18: libclang_rt.tsan_osx_dynamic.dylib`::__tsan_read8(addr=<unavailable>) at tsan_interface.inc:34:3 <call into TSan from from instrumented code> thread llvm#4, queue = 'com.apple.dock.fullscreen' frame #0: libsystem_kernel.dylib frame #1: libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::FutexWait(p=<unavailable>, cmp=<unavailable>) at sanitizer_mac.cpp:540:3 frame #2: libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::Semaphore::Wait(this=<unavailable>) at sanitizer_mutex.cpp:35:7 frame #3: libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::Mutex::Lock(this=0x0000000102992a80) at sanitizer_mutex.h:196:18 frame llvm#4: libclang_rt.tsan_osx_dynamic.dylib`__tsan::ScopedInterceptor::~ScopedInterceptor() [inlined] __sanitizer::GenericScopedLock<__sanitizer::Mutex>::GenericScopedLock(this=<unavailable>, mu=0x0000000102992a80) at sanitizer_mutex.h:383:10 frame llvm#5: libclang_rt.tsan_osx_dynamic.dylib`__tsan::ScopedInterceptor::~ScopedInterceptor() [inlined] __sanitizer::GenericScopedLock<__sanitizer::Mutex>::GenericScopedLock(this=<unavailable>, mu=0x0000000102992a80) at sanitizer_mutex.h:382:77 frame llvm#6: libclang_rt.tsan_osx_dynamic.dylib`__tsan::ScopedInterceptor::~ScopedInterceptor() at tsan_rtl.h:708:10 frame llvm#7: libclang_rt.tsan_osx_dynamic.dylib`__tsan::ScopedInterceptor::~ScopedInterceptor() [inlined] __tsan::TryTraceFunc(thr=0x000000010f084000, pc=0) at tsan_rtl.h:751:7 frame llvm#8: libclang_rt.tsan_osx_dynamic.dylib`__tsan::ScopedInterceptor::~ScopedInterceptor() [inlined] __tsan::FuncExit(thr=0x000000010f084000) at tsan_rtl.h:798:7 frame llvm#9: libclang_rt.tsan_osx_dynamic.dylib`__tsan::ScopedInterceptor::~ScopedInterceptor(this=0x000000016f3ba280) at tsan_interceptors_posix.cpp:300:5 frame llvm#10: libclang_rt.tsan_osx_dynamic.dylib`__tsan::ScopedInterceptor::~ScopedInterceptor(this=<unavailable>) at tsan_interceptors_posix.cpp:293:41 frame llvm#11: libclang_rt.tsan_osx_dynamic.dylib`::wrap_os_unfair_lock_lock_with_options(lock=0x000000016f21b1e8, options=OS_UNFAIR_LOCK_NONE) at tsan_interceptors_mac.cpp:310:1 frame llvm#12: dyld`dyld4::RuntimeLocks::withLoadersReadLock(this=0x000000016f21b1e0, work=0x00000001814525d4) block_pointer) at DyldRuntimeState.cpp:227:28 frame llvm#13: dyld`dyld4::APIs::_dyld_get_image_vmaddr_slide(this=0x0000000101012a20, imageIndex=412) at DyldAPIs.cpp:273:11 frame llvm#14: libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::MemoryMappingLayout::Next(__sanitizer::MemoryMappedSegment*) at sanitizer_procmaps_mac.cpp:286:17 frame llvm#15: libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::MemoryMappingLayout::Next(this=0x000000016f3ba560, segment=0x000000016f3ba498) at sanitizer_procmaps_mac.cpp:432:15 frame llvm#16: libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::MemoryMappingLayout::DumpListOfModules(this=0x000000016f3ba560, modules=0x000000016f3ba618) at sanitizer_procmaps_mac.cpp:460:10 frame llvm#17: libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::ListOfModules::init(this=0x000000016f3ba618) at sanitizer_mac.cpp:610:18 frame llvm#18: libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::LibIgnore::OnLibraryLoaded(this=0x0000000101f3aa40, name="<some library>") at sanitizer_libignore.cpp:54:11 frame llvm#19: libclang_rt.tsan_osx_dynamic.dylib`::wrap_dlopen(filename="<some library>", flag=<unavailable>) at sanitizer_common_interceptors.inc:6466:3 <library code> ``` rdar://106766395 Differential Revision: https://reviews.llvm.org/D146593

…est unittest Need to finalize the DIBuilder to avoid leak sanitizer errors like this: Direct leak of 48 byte(s) in 1 object(s) allocated from: #0 0x55c99ea1761d in operator new(unsigned long) #1 0x55c9a518ae49 in operator new #2 0x55c9a518ae49 in llvm::MDTuple::getImpl(...) #3 0x55c9a4f1b1ec in getTemporary llvm#4 0x55c9a4f1b1ec in llvm::DIBuilder::createFunction(...)

Running this on Amazon Ubuntu the final backtrace is: ``` (lldb) thread backtrace * thread #1, name = 'a.out', stop reason = breakpoint 1.1 * frame #0: 0x0000aaaaaaaa07d0 a.out`func_c at main.c:10:3 frame #1: 0x0000aaaaaaaa07c4 a.out`func_b at main.c:14:3 frame #2: 0x0000aaaaaaaa07b4 a.out`func_a at main.c:18:3 frame #3: 0x0000aaaaaaaa07a4 a.out`main(argc=<unavailable>, argv=<unavailable>) at main.c:22:3 frame llvm#4: 0x0000fffff7b373fc libc.so.6`___lldb_unnamed_symbol2962 + 108 frame llvm#5: 0x0000fffff7b374cc libc.so.6`__libc_start_main + 152 frame llvm#6: 0x0000aaaaaaaa06b0 a.out`_start + 48 ``` This causes the test to fail because of the extra ___lldb_unnamed_symbol2962 frame (an inlined function?). To fix this, strictly check all the frames in main.c then for the rest just check we find __libc_start_main and _start in that order regardless of other frames in between. Reviewed By: omjavaid Differential Revision: https://reviews.llvm.org/D154204

…tput The crash happens in clang::driver::tools::SplitDebugName when Output is InputInfo::Nothing. It doesn't happen with standalone clang driver because output is created in Driver::BuildJobsForActionNoCache. Example backtrace: ``` * thread #1, name = 'clangd', stop reason = hit program assert * frame #0: 0x00007ffff5c4eacf libc.so.6`raise + 271 frame #1: 0x00007ffff5c21ea5 libc.so.6`abort + 295 frame #2: 0x00007ffff5c21d79 libc.so.6`__assert_fail_base.cold.0 + 15 frame #3: 0x00007ffff5c47426 libc.so.6`__assert_fail + 70 frame llvm#4: 0x000055555dc0923c clangd`clang::driver::InputInfo::getFilename(this=0x00007fffffff9398) const at InputInfo.h:84:5 frame llvm#5: 0x000055555dcd0d8d clangd`clang::driver::tools::SplitDebugName(JA=0x000055555f6c6a50, Args=0x000055555f6d0b80, Input=0x00007fffffff9678, Output=0x00007fffffff9398) at CommonArgs.cpp:1275:40 frame llvm#6: 0x000055555dc955a5 clangd`clang::driver::tools::Clang::ConstructJob(this=0x000055555f6c69d0, C=0x000055555f6c64a0, JA=0x000055555f6c6a50, Output=0x00007fffffff9398, Inputs=0x00007fffffff9668, Args=0x000055555f6d0b80, LinkingOutput=0x0000000000000000) const at Clang.cpp:5690:33 frame llvm#7: 0x000055555dbf6b54 clangd`clang::driver::Driver::BuildJobsForActionNoCache(this=0x00007fffffffb5e0, C=0x000055555f6c64a0, A=0x000055555f6c6a50, TC=0x000055555f6c4be0, BoundArch=(Data = 0x0000000000000000, Length = 0), AtTopLevel=true, MultipleArchs=false, LinkingOutput=0x0000000000000000, CachedResults=size=1, TargetDeviceOffloadKind=OFK_None) const at Driver.cpp:5618:10 frame llvm#8: 0x000055555dbf4ef0 clangd`clang::driver::Driver::BuildJobsForAction(this=0x00007fffffffb5e0, C=0x000055555f6c64a0, A=0x000055555f6c6a50, TC=0x000055555f6c4be0, BoundArch=(Data = 0x0000000000000000, Length = 0), AtTopLevel=true, MultipleArchs=false, LinkingOutput=0x0000000000000000, CachedResults=size=1, TargetDeviceOffloadKind=OFK_None) const at Driver.cpp:5306:26 frame llvm#9: 0x000055555dbeb590 clangd`clang::driver::Driver::BuildJobs(this=0x00007fffffffb5e0, C=0x000055555f6c64a0) const at Driver.cpp:4844:5 frame llvm#10: 0x000055555dbe6b0f clangd`clang::driver::Driver::BuildCompilation(this=0x00007fffffffb5e0, ArgList=ArrayRef<const char *> @ 0x00007fffffffb268) at Driver.cpp:1496:3 frame llvm#11: 0x000055555b0cc0d9 clangd`clang::createInvocation(ArgList=ArrayRef<const char *> @ 0x00007fffffffbb38, Opts=CreateInvocationOptions @ 0x00007fffffffbb90) at CreateInvocationFromCommandLine.cpp:53:52 frame llvm#12: 0x000055555b378e7b clangd`clang::clangd::buildCompilerInvocation(Inputs=0x00007fffffffca58, D=0x00007fffffffc158, CC1Args=size=0) at Compiler.cpp:116:44 frame llvm#13: 0x000055555895a6c8 clangd`clang::clangd::(anonymous namespace)::Checker::buildInvocation(this=0x00007fffffffc760, TFS=0x00007fffffffe570, Contents= Has Value=false ) at Check.cpp:212:9 frame llvm#14: 0x0000555558959cec clangd`clang::clangd::check(File=(Data = "build/test.cpp", Length = 64), TFS=0x00007fffffffe570, Opts=0x00007fffffffe600) at Check.cpp:486:34 frame llvm#15: 0x000055555892164a clangd`main(argc=4, argv=0x00007fffffffecd8) at ClangdMain.cpp:993:12 frame llvm#16: 0x00007ffff5c3ad85 libc.so.6`__libc_start_main + 229 frame llvm#17: 0x00005555585bbe9e clangd`_start + 46 ``` Test Plan: ninja ClangDriverTests && tools/clang/unittests/Driver/ClangDriverTests Differential Revision: https://reviews.llvm.org/D154602

TSan reports the following data race: Write of size 4 at 0x000109e0b160 by thread T2 (mutexes: write M0, write M1): #0 NativeFile::Close() File.cpp:329 #1 ConnectionFileDescriptor::Disconnect(lldb_private::Status*) ConnectionFileDescriptorPosix.cpp:232 #2 Communication::Disconnect(lldb_private::Status*) Communication.cpp:61 #3 process_gdb_remote::ProcessGDBRemote::DidExit() ProcessGDBRemote.cpp:1164 llvm#4 Process::SetExitStatus(int, char const*) Process.cpp:1097 llvm#5 process_gdb_remote::ProcessGDBRemote::MonitorDebugserverProcess(...) ProcessGDBRemote.cpp:3387 Previous read of size 4 at 0x000109e0b160 by main thread (mutexes: write M2): #0 NativeFile::IsValid() const File.h:393 #1 ConnectionFileDescriptor::IsConnected() const ConnectionFileDescriptorPosix.cpp:121 #2 Communication::IsConnected() const Communication.cpp:79 #3 process_gdb_remote::GDBRemoteCommunication::WaitForPacketNoLock(...) GDBRemoteCommunication.cpp:256 llvm#4 process_gdb_remote::GDBRemoteCommunication::WaitForPacketNoLock(...l) GDBRemoteCommunication.cpp:244 llvm#5 process_gdb_remote::GDBRemoteClientBase::SendPacketAndWaitForResponseNoLock(llvm::StringRef, StringExtractorGDBRemote&) GDBRemoteClientBase.cpp:246 The problem is that in WaitForPacketNoLock's run loop, it checks that the connection is still connected. This races with the ConnectionFileDescriptor disconnecting. Most (but not all) access to the IOObject in ConnectionFileDescriptorPosix is already gated by the mutex. This patch just protects IsConnected in the same way. Differential revision: https://reviews.llvm.org/D157347

Thread sanitizer reports the following data race: ``` WARNING: ThreadSanitizer: data race (pid=43201) Write of size 4 at 0x00010520c474 by thread T1 (mutexes: write M0, write M1): #0 lldb_private::PipePosix::CloseWriteFileDescriptor() PipePosix.cpp:242 (liblldb.18.0.0git.dylib:arm64+0x414700) (BuildId: 2983976beb2637b5943bff32fd12eb8932000000200000000100000000000e00) #1 lldb_private::PipePosix::Close() PipePosix.cpp:217 (liblldb.18.0.0git.dylib:arm64+0x4144e8) (BuildId: 2983976beb2637b5943bff32fd12eb8932000000200000000100000000000e00) #2 lldb_private::ConnectionFileDescriptor::Disconnect(lldb_private::Status*) ConnectionFileDescriptorPosix.cpp:239 (liblldb.18.0.0git.dylib:arm64+0x40a620) (BuildId: 2983976beb2637b5943bff32fd12eb8932000000200000000100000000000e00) #3 lldb_private::Communication::Disconnect(lldb_private::Status*) Communication.cpp:61 (liblldb.18.0.0git.dylib:arm64+0x2a9318) (BuildId: 2983976beb2637b5943bff32fd12eb8932000000200000000100000000000e00) llvm#4 lldb_private::process_gdb_remote::ProcessGDBRemote::DidExit() ProcessGDBRemote.cpp:1167 (liblldb.18.0.0git.dylib:arm64+0x8ed984) (BuildId: 2983976beb2637b5943bff32fd12eb8932000000200000000100000000000e00) Previous read of size 4 at 0x00010520c474 by main thread (mutexes: write M2, write M3): #0 lldb_private::PipePosix::CanWrite() const PipePosix.cpp:229 (liblldb.18.0.0git.dylib:arm64+0x4145e4) (BuildId: 2983976beb2637b5943bff32fd12eb8932000000200000000100000000000e00) #1 lldb_private::ConnectionFileDescriptor::Disconnect(lldb_private::Status*) ConnectionFileDescriptorPosix.cpp:212 (liblldb.18.0.0git.dylib:arm64+0x40a4a8) (BuildId: 2983976beb2637b5943bff32fd12eb8932000000200000000100000000000e00) #2 lldb_private::Communication::Disconnect(lldb_private::Status*) Communication.cpp:61 (liblldb.18.0.0git.dylib:arm64+0x2a9318) (BuildId: 2983976beb2637b5943bff32fd12eb8932000000200000000100000000000e00) #3 lldb_private::process_gdb_remote::GDBRemoteCommunication::WaitForPacketNoLock(StringExtractorGDBRemote&, lldb_private::Timeout<std::__1::ratio<1l, 1000000l>>, bool) GDBRemoteCommunication.cpp:373 (liblldb.18.0.0git.dylib:arm64+0x8b9c48) (BuildId: 2983976beb2637b5943bff32fd12eb8932000000200000000100000000000e00) llvm#4 lldb_private::process_gdb_remote::GDBRemoteCommunication::WaitForPacketNoLock(StringExtractorGDBRemote&, lldb_private::Timeout<std::__1::ratio<1l, 1000000l>>, bool) GDBRemoteCommunication.cpp:243 (liblldb.18.0.0git.dylib:arm64+0x8b9904) (BuildId: 2983976beb2637b5943bff32fd12eb8932000000200000000100000000000e00) ``` Fix this by adding a mutex to PipePosix. Differential Revision: https://reviews.llvm.org/D157654

This reverts commit 0e63f1a. clang-format started to crash with contents like: a.h: ``` ``` $ clang-format a.h ``` PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace. Stack dump: 0. Program arguments: ../llvm/build/bin/clang-format a.h #0 0x0000560b689fe177 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /usr/local/google/home/kadircet/repos/llvm/llvm/lib/Support/Unix/Signals.inc:723:13 #1 0x0000560b689fbfbe llvm::sys::RunSignalHandlers() /usr/local/google/home/kadircet/repos/llvm/llvm/lib/Support/Signals.cpp:106:18 #2 0x0000560b689feaca SignalHandler(int) /usr/local/google/home/kadircet/repos/llvm/llvm/lib/Support/Unix/Signals.inc:413:1 #3 0x00007f030405a540 (/lib/x86_64-linux-gnu/libc.so.6+0x3c540) llvm#4 0x0000560b68a9a980 is /usr/local/google/home/kadircet/repos/llvm/clang/include/clang/Lex/Token.h:98:44 llvm#5 0x0000560b68a9a980 is /usr/local/google/home/kadircet/repos/llvm/clang/lib/Format/FormatToken.h:562:51 llvm#6 0x0000560b68a9a980 startsSequenceInternal<clang::tok::TokenKind, clang::tok::TokenKind> /usr/local/google/home/kadircet/repos/llvm/clang/lib/Format/FormatToken.h:831:9 llvm#7 0x0000560b68a9a980 startsSequence<clang::tok::TokenKind, clang::tok::TokenKind> /usr/local/google/home/kadircet/repos/llvm/clang/lib/Format/FormatToken.h:600:12 llvm#8 0x0000560b68a9a980 getFunctionName /usr/local/google/home/kadircet/repos/llvm/clang/lib/Format/TokenAnnotator.cpp:3131:17 llvm#9 0x0000560b68a9a980 clang::format::TokenAnnotator::annotate(clang::format::AnnotatedLine&) /usr/local/google/home/kadircet/repos/llvm/clang/lib/Format/TokenAnnotator.cpp:3191:17 Segmentation fault ```

…fine.parallel verifier This patch updates AffineParallelOp::verify() to check each result type matches its corresponding reduction op (i.e, the result type must be a `FloatType` if the reduction attribute is `addf`) affine.parallel will crash on --lower-affine if the corresponding result type cannot match the reduction attribute. ``` %128 = affine.parallel (%arg2, %arg3) = (0, 0) to (8, 7) reduce ("maxf") -> (memref<8x7xf32>) { %alloc_33 = memref.alloc() : memref<8x7xf32> affine.yield %alloc_33 : memref<8x7xf32> } ``` This will crash and report a type conversion issue when we run `mlir-opt --lower-affine` ``` Assertion failed: (isa<To>(Val) && "cast<Ty>() argument of incompatible type!"), function cast, file Casting.h, line 572. PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace. Stack dump: 0. Program arguments: mlir-opt --lower-affine temp.mlir #0 0x0000000102a18f18 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/workspacebin/mlir-opt+0x1002f8f18) #1 0x0000000102a171b4 llvm::sys::RunSignalHandlers() (/workspacebin/mlir-opt+0x1002f71b4) #2 0x0000000102a195c4 SignalHandler(int) (/workspacebin/mlir-opt+0x1002f95c4) #3 0x00000001be7894c4 (/usr/lib/system/libsystem_platform.dylib+0x1803414c4) llvm#4 0x00000001be771ee0 (/usr/lib/system/libsystem_pthread.dylib+0x180329ee0) llvm#5 0x00000001be6ac340 (/usr/lib/system/libsystem_c.dylib+0x180264340) llvm#6 0x00000001be6ab754 (/usr/lib/system/libsystem_c.dylib+0x180263754) llvm#7 0x0000000106864790 mlir::arith::getIdentityValueAttr(mlir::arith::AtomicRMWKind, mlir::Type, mlir::OpBuilder&, mlir::Location) (.cold.4) (/workspacebin/mlir-opt+0x104144790) llvm#8 0x0000000102ba66ac mlir::arith::getIdentityValueAttr(mlir::arith::AtomicRMWKind, mlir::Type, mlir::OpBuilder&, mlir::Location) (/workspacebin/mlir-opt+0x1004866ac) llvm#9 0x0000000102ba6910 mlir::arith::getIdentityValue(mlir::arith::AtomicRMWKind, mlir::Type, mlir::OpBuilder&, mlir::Location) (/workspacebin/mlir-opt+0x100486910) ... ``` Fixes llvm#64068 Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D157985

…tePluginObject After llvm#68052 this function changed from returning a nullptr with `return {};` to returning Expected and hitting `llvm_unreachable` before it could do so. I gather that we're never supposed to call this function, but on Windows we actually do call this function because `interpreter->CreateScriptedProcessInterface()` returns `ScriptedProcessInterface` not `ScriptedProcessPythonInterface`. Likely because `target_sp->GetDebugger().GetScriptInterpreter()` also does not return a Python related class. The previously XFAILed test crashed with: ``` # .---command stderr------------ # | PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace. # | Stack dump: # | 0. Program arguments: c:\\users\\tcwg\\david.spickett\\build-llvm\\bin\\lldb-test.exe ir-memory-map C:\\Users\\tcwg\\david.spickett\\build-llvm\\tools\\lldb\\test\\Shell\\Expr\\Output\\TestIRMemoryMapWindows.test.tmp C:\\Users\\tcwg\\david.spickett\\llvm-project\\lldb\\test\\Shell\\Expr/Inputs/ir-memory-map-basic # | 1. HandleCommand(command = "run") # | Exception Code: 0xC000001D # | #0 0x00007ff696b5f588 lldb_private::ScriptedProcessInterface::CreatePluginObject(class llvm::StringRef, class lldb_private::ExecutionContext &, class std::shared_ptr<class lldb_private::StructuredData::Dictionary>, class lldb_private::StructuredData::Generic *) C:\Users\tcwg\david.spickett\llvm-project\lldb\include\lldb\Interpreter\Interfaces\ScriptedProcessInterface.h:28:0 # | #1 0x00007ff696b1d808 llvm::Expected<std::shared_ptr<lldb_private::StructuredData::Generic> >::operator bool C:\Users\tcwg\david.spickett\llvm-project\llvm\include\llvm\Support\Error.h:567:0 # | #2 0x00007ff696b1d808 lldb_private::ScriptedProcess::ScriptedProcess(class std::shared_ptr<class lldb_private::Target>, class std::shared_ptr<class lldb_private::Listener>, class lldb_private::ScriptedMetadata const &, class lldb_private::Status &) C:\Users\tcwg\david.spickett\llvm-project\lldb\source\Plugins\Process\scripted\ScriptedProcess.cpp:115:0 # | #3 0x00007ff696b1d124 std::shared_ptr<lldb_private::ScriptedProcess>::shared_ptr C:\Program Files\Microsoft Visual Studio\2022\Preview\VC\Tools\MSVC\14.35.32124\include\memory:1478:0 # | llvm#4 0x00007ff696b1d124 lldb_private::ScriptedProcess::CreateInstance(class std::shared_ptr<class lldb_private::Target>, class std::shared_ptr<class lldb_private::Listener>, class lldb_private::FileSpec const *, bool) C:\Users\tcwg\david.spickett\llvm-project\lldb\source\Plugins\Process\scripted\ScriptedProcess.cpp:61:0 # | llvm#5 0x00007ff69699c8f4 std::_Ptr_base<lldb_private::Process>::_Move_construct_from C:\Program Files\Microsoft Visual Studio\2022\Preview\VC\Tools\MSVC\14.35.32124\include\memory:1237:0 # | llvm#6 0x00007ff69699c8f4 std::shared_ptr<lldb_private::Process>::shared_ptr C:\Program Files\Microsoft Visual Studio\2022\Preview\VC\Tools\MSVC\14.35.32124\include\memory:1534:0 # | llvm#7 0x00007ff69699c8f4 std::shared_ptr<lldb_private::Process>::operator= C:\Program Files\Microsoft Visual Studio\2022\Preview\VC\Tools\MSVC\14.35.32124\include\memory:1594:0 # | llvm#8 0x00007ff69699c8f4 lldb_private::Process::FindPlugin(class std::shared_ptr<class lldb_private::Target>, class llvm::StringRef, class std::shared_ptr<class lldb_private::Listener>, class lldb_private::FileSpec const *, bool) C:\Users\tcwg\david.spickett\llvm-project\lldb\source\Target\Process.cpp:396:0 # | llvm#9 0x00007ff6969bd708 std::_Ptr_base<lldb_private::Process>::_Move_construct_from C:\Program Files\Microsoft Visual Studio\2022\Preview\VC\Tools\MSVC\14.35.32124\include\memory:1237:0 # | llvm#10 0x00007ff6969bd708 std::shared_ptr<lldb_private::Process>::shared_ptr C:\Program Files\Microsoft Visual Studio\2022\Preview\VC\Tools\MSVC\14.35.32124\include\memory:1534:0 # | llvm#11 0x00007ff6969bd708 std::shared_ptr<lldb_private::Process>::operator= C:\Program Files\Microsoft Visual Studio\2022\Preview\VC\Tools\MSVC\14.35.32124\include\memory:1594:0 # | llvm#12 0x00007ff6969bd708 lldb_private::Target::CreateProcess(class std::shared_ptr<class lldb_private::Listener>, class llvm::StringRef, class lldb_private::FileSpec const *, bool) C:\Users\tcwg\david.spickett\llvm-project\lldb\source\Target\Target.cpp:215:0 # | llvm#13 0x00007ff696b13af0 std::_Ptr_base<lldb_private::Process>::_Ptr_base C:\Program Files\Microsoft Visual Studio\2022\Preview\VC\Tools\MSVC\14.35.32124\include\memory:1230:0 # | llvm#14 0x00007ff696b13af0 std::shared_ptr<lldb_private::Process>::shared_ptr C:\Program Files\Microsoft Visual Studio\2022\Preview\VC\Tools\MSVC\14.35.32124\include\memory:1524:0 # | llvm#15 0x00007ff696b13af0 lldb_private::PlatformWindows::DebugProcess(class lldb_private::ProcessLaunchInfo &, class lldb_private::Debugger &, class lldb_private::Target &, class lldb_private::Status &) C:\Users\tcwg\david.spickett\llvm-project\lldb\source\Plugins\Platform\Windows\PlatformWindows.cpp:495:0 # | llvm#16 0x00007ff6969cf590 std::_Ptr_base<lldb_private::Process>::_Move_construct_from C:\Program Files\Microsoft Visual Studio\2022\Preview\VC\Tools\MSVC\14.35.32124\include\memory:1237:0 # | llvm#17 0x00007ff6969cf590 std::shared_ptr<lldb_private::Process>::shared_ptr C:\Program Files\Microsoft Visual Studio\2022\Preview\VC\Tools\MSVC\14.35.32124\include\memory:1534:0 # | llvm#18 0x00007ff6969cf590 std::shared_ptr<lldb_private::Process>::operator= C:\Program Files\Microsoft Visual Studio\2022\Preview\VC\Tools\MSVC\14.35.32124\include\memory:1594:0 # | llvm#19 0x00007ff6969cf590 lldb_private::Target::Launch(class lldb_private::ProcessLaunchInfo &, class lldb_private::Stream *) C:\Users\tcwg\david.spickett\llvm-project\lldb\source\Target\Target.cpp:3274:0 # | llvm#20 0x00007ff696fff82c CommandObjectProcessLaunch::DoExecute(class lldb_private::Args &, class lldb_private::CommandReturnObject &) C:\Users\tcwg\david.spickett\llvm-project\lldb\source\Commands\CommandObjectProcess.cpp:258:0 # | llvm#21 0x00007ff696fab6c0 lldb_private::CommandObjectParsed::Execute(char const *, class lldb_private::CommandReturnObject &) C:\Users\tcwg\david.spickett\llvm-project\lldb\source\Interpreter\CommandObject.cpp:751:0 # `----------------------------- # error: command failed with exit status: 0xc000001d ``` That might be a bug on the Windows side, or an artifact of how our build is setup, but whatever it is, having `CreatePluginObject` return an error and the caller check it, fixes the failing test. The built lldb can run the script command to use Python, but I'm not sure if that means anything.

… on (llvm#74207) lld string tail merging interacts badly with ASAN on Windows, as is reported in llvm#62078. A similar error was found when building LLVM with `-DLLVM_USE_SANITIZER=Address`: ```console [2/2] Building GenVT.inc... FAILED: include/llvm/CodeGen/GenVT.inc C:/Dev/llvm-project/Build_asan/include/llvm/CodeGen/GenVT.inc cmd.exe /C "cd /D C:\Dev\llvm-project\Build_asan && C:\Dev\llvm-project\Build_asan\bin\llvm-min-tblgen.exe -gen-vt -I C:/Dev/llvm-project/llvm/include/llvm/CodeGen -IC:/Dev/llvm-project/Build_asan/include -IC:/Dev/llvm-project/llvm/include C:/Dev/llvm-project/llvm/include/llvm/CodeGen/ValueTypes.td --write-if-changed -o include/llvm/CodeGen/GenVT.inc -d include/llvm/CodeGen/GenVT.inc.d" ================================================================= ==31944==ERROR: AddressSanitizer: global-buffer-overflow on address 0x7ff6cff80d20 at pc 0x7ff6cfcc7378 bp 0x00e8bcb8e990 sp 0x00e8bcb8e9d8 READ of size 1 at 0x7ff6cff80d20 thread T0 #0 0x7ff6cfcc7377 in strlen (C:\Dev\llvm-project\Build_asan\bin\llvm-min-tblgen.exe+0x1400a7377) #1 0x7ff6cfde50c2 in operator delete(void *, unsigned __int64) (C:\Dev\llvm-project\Build_asan\bin\llvm-min-tblgen.exe+0x1401c50c2) #2 0x7ff6cfdd75ef in operator delete(void *, unsigned __int64) (C:\Dev\llvm-project\Build_asan\bin\llvm-min-tblgen.exe+0x1401b75ef) #3 0x7ff6cfde59f9 in operator delete(void *, unsigned __int64) (C:\Dev\llvm-project\Build_asan\bin\llvm-min-tblgen.exe+0x1401c59f9) llvm#4 0x7ff6cff03f6c in operator delete(void *, unsigned __int64) (C:\Dev\llvm-project\Build_asan\bin\llvm-min-tblgen.exe+0x1402e3f6c) llvm#5 0x7ff6cfefbcbc in operator delete(void *, unsigned __int64) (C:\Dev\llvm-project\Build_asan\bin\llvm-min-tblgen.exe+0x1402dbcbc) llvm#6 0x7ffb7f247343 (C:\WINDOWS\System32\KERNEL32.DLL+0x180017343) llvm#7 0x7ffb800826b0 (C:\WINDOWS\SYSTEM32\ntdll.dll+0x1800526b0) 0x7ff6cff80d20 is located 31 bytes after global variable '"#error \"ArgKind is not defined\"\n"...' defined in 'C:\Dev\llvm-project\llvm\utils\TableGen\IntrinsicEmitter.cpp' (0x7ff6cff80ce0) of size 33 '"#error \"ArgKind is not defined\"\n"...' is ascii string '#error "ArgKind is not defined" ' 0x7ff6cff80d20 is located 0 bytes inside of global variable '""' defined in 'C:\Dev\llvm-project\llvm\utils\TableGen\IntrinsicEmitter.cpp' (0x7ff6cff80d20) of size 1 '""' is ascii string '' SUMMARY: AddressSanitizer: global-buffer-overflow (C:\Dev\llvm-project\Build_asan\bin\llvm-min-tblgen.exe+0x1400a7377) in strlen Shadow bytes around the buggy address: 0x7ff6cff80a80: 01 f9 f9 f9 f9 f9 f9 f9 00 00 00 00 01 f9 f9 f9 0x7ff6cff80b00: f9 f9 f9 f9 00 00 00 00 00 00 00 00 01 f9 f9 f9 0x7ff6cff80b80: f9 f9 f9 f9 00 00 00 00 01 f9 f9 f9 f9 f9 f9 f9 0x7ff6cff80c00: 00 00 00 00 01 f9 f9 f9 f9 f9 f9 f9 00 00 00 00 0x7ff6cff80c80: 00 00 00 00 01 f9 f9 f9 f9 f9 f9 f9 00 00 00 00 =>0x7ff6cff80d00: 01 f9 f9 f9[f9]f9 f9 f9 00 00 00 00 00 00 00 00 0x7ff6cff80d80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x7ff6cff80e00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x7ff6cff80e80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x7ff6cff80f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x7ff6cff80f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb ==31944==ABORTING ``` This is reproducible with the 17.0.3 release: ```console $ clang-cl --version clang version 17.0.3 Target: x86_64-pc-windows-msvc Thread model: posix InstalledDir: C:\Program Files\LLVM\bin $ cmake -S llvm -B Build -G Ninja -DLLVM_USE_SANITIZER=Address -DCMAKE_C_COMPILER=clang-cl -DCMAKE_CXX_COMPILER=clang-cl -DCMAKE_MSVC_RUNTIME_LIBRARY=MultiThreaded -DCMAKE_BUILD_TYPE=Release $ cd Build $ ninja all ```

Internal builds of the unittests with msan flagged mempcpy_test. ==6862==WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x55e34d7d734a in length llvm-project/libc/src/__support/CPP/string_view.h:41:11 #1 0x55e34d7d734a in string_view llvm-project/libc/src/__support/CPP/string_view.h:71:24 #2 0x55e34d7d734a in __llvm_libc_9999_0_0_git::testing::Test::testStrEq(char const*, char const*, char const*, char const*, __llvm_libc_9999_0_0_git::testing::internal::Location) llvm-project/libc/test/UnitTest/LibcTest.cpp:284:13 #3 0x55e34d7d4e09 in LlvmLibcMempcpyTest_Simple::Run() llvm-project/libc/test/src/string/mempcpy_test.cpp:20:3 llvm#4 0x55e34d7d6dff in __llvm_libc_9999_0_0_git::testing::Test::runTests(char const*) llvm-project/libc/test/UnitTest/LibcTest.cpp:133:8 llvm#5 0x55e34d7d86e0 in main llvm-project/libc/test/UnitTest/LibcTestMain.cpp:21:10 SUMMARY: MemorySanitizer: use-of-uninitialized-value llvm-project/libc/src/__support/CPP/string_view.h:41:11 in length What's going on here is that mempcpy_test.cpp's Simple test is using ASSERT_STREQ with a partially initialized char array. ASSERT_STREQ calls Test::testStrEq which constructs a cpp:string_view. That constructor calls the private method cpp::string_view::length. When built with msan, the loop is transformed into multi-byte access, which then fails upon access. I took a look at libc++'s __constexpr_strlen which just calls __builtin_strlen(). Replacing the implementation of cpp::string_view::length with a call to __builtin_strlen() may still result in out of bounds access when the test is built with msan. It's not safe to use ASSERT_STREQ with a partially initialized array. Initialize the whole array so that the test passes.

We'd like a way to select the current thread by its thread ID (rather than its internal LLDB thread index). This PR adds a `-t` option (`--thread_id` long option) that tells the `thread select` command to interpret the `<thread-index>` argument as a thread ID. Here's an example of it working: ``` michristensen@devbig356 llvm/llvm-project (thread-select-tid) » ../Debug/bin/lldb ~/scratch/cpp/threading/a.out (lldb) target create "/home/michristensen/scratch/cpp/threading/a.out" Current executable set to '/home/michristensen/scratch/cpp/threading/a.out' (x86_64). (lldb) b 18 Breakpoint 1: where = a.out`main + 80 at main.cpp:18:12, address = 0x0000000000000850 (lldb) run Process 215715 launched: '/home/michristensen/scratch/cpp/threading/a.out' (x86_64) This is a thread, i=1 This is a thread, i=2 This is a thread, i=3 This is a thread, i=4 This is a thread, i=5 Process 215715 stopped * thread #1, name = 'a.out', stop reason = breakpoint 1.1 frame #0: 0x0000555555400850 a.out`main at main.cpp:18:12 15 for (int i = 0; i < 5; i++) { 16 pthread_create(&thread_ids[i], NULL, foo, NULL); 17 } -> 18 for (int i = 0; i < 5; i++) { 19 pthread_join(thread_ids[i], NULL); 20 } 21 return 0; (lldb) thread select 2 * thread #2, name = 'a.out' frame #0: 0x00007ffff68f9918 libc.so.6`__nanosleep + 72 libc.so.6`__nanosleep: -> 0x7ffff68f9918 <+72>: cmpq $-0x1000, %rax ; imm = 0xF000 0x7ffff68f991e <+78>: ja 0x7ffff68f9952 ; <+130> 0x7ffff68f9920 <+80>: movl %edx, %edi 0x7ffff68f9922 <+82>: movl %eax, 0xc(%rsp) (lldb) thread info thread #2: tid = 216047, 0x00007ffff68f9918 libc.so.6`__nanosleep + 72, name = 'a.out' (lldb) thread list Process 215715 stopped thread #1: tid = 215715, 0x0000555555400850 a.out`main at main.cpp:18:12, name = 'a.out', stop reason = breakpoint 1.1 * thread #2: tid = 216047, 0x00007ffff68f9918 libc.so.6`__nanosleep + 72, name = 'a.out' thread #3: tid = 216048, 0x00007ffff68f9918 libc.so.6`__nanosleep + 72, name = 'a.out' thread llvm#4: tid = 216049, 0x00007ffff68f9918 libc.so.6`__nanosleep + 72, name = 'a.out' thread llvm#5: tid = 216050, 0x00007ffff68f9918 libc.so.6`__nanosleep + 72, name = 'a.out' thread llvm#6: tid = 216051, 0x00007ffff68f9918 libc.so.6`__nanosleep + 72, name = 'a.out' (lldb) thread select 215715 error: invalid thread #215715. (lldb) thread select -t 215715 * thread #1, name = 'a.out', stop reason = breakpoint 1.1 frame #0: 0x0000555555400850 a.out`main at main.cpp:18:12 15 for (int i = 0; i < 5; i++) { 16 pthread_create(&thread_ids[i], NULL, foo, NULL); 17 } -> 18 for (int i = 0; i < 5; i++) { 19 pthread_join(thread_ids[i], NULL); 20 } 21 return 0; (lldb) thread select -t 216051 * thread llvm#6, name = 'a.out' frame #0: 0x00007ffff68f9918 libc.so.6`__nanosleep + 72 libc.so.6`__nanosleep: -> 0x7ffff68f9918 <+72>: cmpq $-0x1000, %rax ; imm = 0xF000 0x7ffff68f991e <+78>: ja 0x7ffff68f9952 ; <+130> 0x7ffff68f9920 <+80>: movl %edx, %edi 0x7ffff68f9922 <+82>: movl %eax, 0xc(%rsp) (lldb) thread select 3 * thread #3, name = 'a.out' frame #0: 0x00007ffff68f9918 libc.so.6`__nanosleep + 72 libc.so.6`__nanosleep: -> 0x7ffff68f9918 <+72>: cmpq $-0x1000, %rax ; imm = 0xF000 0x7ffff68f991e <+78>: ja 0x7ffff68f9952 ; <+130> 0x7ffff68f9920 <+80>: movl %edx, %edi 0x7ffff68f9922 <+82>: movl %eax, 0xc(%rsp) (lldb) thread select -t 216048 * thread #3, name = 'a.out' frame #0: 0x00007ffff68f9918 libc.so.6`__nanosleep + 72 libc.so.6`__nanosleep: -> 0x7ffff68f9918 <+72>: cmpq $-0x1000, %rax ; imm = 0xF000 0x7ffff68f991e <+78>: ja 0x7ffff68f9952 ; <+130> 0x7ffff68f9920 <+80>: movl %edx, %edi 0x7ffff68f9922 <+82>: movl %eax, 0xc(%rsp) (lldb) thread select --thread_id 216048 * thread #3, name = 'a.out' frame #0: 0x00007ffff68f9918 libc.so.6`__nanosleep + 72 libc.so.6`__nanosleep: -> 0x7ffff68f9918 <+72>: cmpq $-0x1000, %rax ; imm = 0xF000 0x7ffff68f991e <+78>: ja 0x7ffff68f9952 ; <+130> 0x7ffff68f9920 <+80>: movl %edx, %edi 0x7ffff68f9922 <+82>: movl %eax, 0xc(%rsp) (lldb) help thread select Change the currently selected thread. Syntax: thread select <cmd-options> <thread-index> Command Options Usage: thread select [-t] <thread-index> -t ( --thread_id ) Provide a thread ID instead of a thread index. This command takes options and free-form arguments. If your arguments resemble option specifiers (i.e., they start with a - or --), you must use ' -- ' between the end of the command options and the beginning of the arguments. (lldb) c Process 215715 resuming Process 215715 exited with status = 0 (0x00000000) ```

This PR adds support for thread names in lldb on Windows. ``` (lldb) thr list Process 2960 stopped thread llvm#53: tid = 0x03a0, 0x00007ff84582db34 ntdll.dll`NtWaitForMultipleObjects + 20 thread llvm#29: tid = 0x04ec, 0x00007ff845830a14 ntdll.dll`NtWaitForAlertByThreadId + 20, name = 'SPUW.6' thread llvm#89: tid = 0x057c, 0x00007ff845830a14 ntdll.dll`NtWaitForAlertByThreadId + 20, name = 'PPU[0x1000019] physics[main]' thread #3: tid = 0x0648, 0x00007ff843c2cafe combase.dll`InternalDoATClassCreate + 39518 thread llvm#93: tid = 0x0688, 0x00007ff845830a14 ntdll.dll`NtWaitForAlertByThreadId + 20, name = 'PPU[0x100501d] uMovie::StreamingThread' thread #1: tid = 0x087c, 0x00007ff842e7a104 win32u.dll`NtUserMsgWaitForMultipleObjectsEx + 20 thread llvm#96: tid = 0x0890, 0x00007ff845830a14 ntdll.dll`NtWaitForAlertByThreadId + 20, name = 'PPU[0x1002020] HLE Video Decoder' <...> ```

When shl is folded in compare instruction, a miscompilation occurs when the CMP instruction is also sign-extended. For the following IR: %op3 = shl i8 %op2, 3 %tmp3 = icmp eq i8 %tmp2, %op3 It used to generate cmp w8, w9, sxtb #3 which means sign extend w9, shift left by 3, and then compare with the value in w8. However, the original intention of the IR would require `%op2` to first shift left before extending the operands in the comparison operation . Moreover, if sign extension is used instead of zero extension, the sample test would miscompile. This PR creates a fix for the issue, more specifically to not fold the left shift into the CMP instruction, and to create a zero-extended value rather than a sign-extended value.

@0

…lvm#80904)" This reverts commit b1ac052. This commit breaks coroutine splitting for non-swift calling convention functions. In this example: ```ll ; ModuleID = 'repro.ll' source_filename = "stdlib/test/runtime/test_llcl.mojo" target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128" target triple = "x86_64-unknown-linux-gnu" @0 = internal constant { i32, i32 } { i32 trunc (i64 sub (i64 ptrtoint (ptr @craSH to i64), i64 ptrtoint (ptr getelementptr inbounds ({ i32, i32 }, ptr @0, i32 0, i32 1) to i64)) to i32), i32 64 } define dso_local void @af_suspend_fn(ptr %0, i64 %1, ptr %2) #0 { ret void } define dso_local void @craSH(ptr %0) #0 { %2 = call token @llvm.coro.id.async(i32 64, i32 8, i32 0, ptr @0) %3 = call ptr @llvm.coro.begin(token %2, ptr null) %4 = getelementptr inbounds { ptr, { ptr, ptr }, i64, { ptr, i1 }, i64, i64 }, ptr poison, i32 0, i32 0 %5 = call ptr @llvm.coro.async.resume() store ptr %5, ptr %4, align 8 %6 = call { ptr, ptr, ptr } (i32, ptr, ptr, ...) @llvm.coro.suspend.async.sl_p0p0p0s(i32 0, ptr %5, ptr @ctxt_proj_fn, ptr @af_suspend_fn, ptr poison, i64 -1, ptr poison) ret void } define dso_local ptr @ctxt_proj_fn(ptr %0) #0 { ret ptr %0 } ; Function Attrs: nomerge nounwind declare { ptr, ptr, ptr } @llvm.coro.suspend.async.sl_p0p0p0s(i32, ptr, ptr, ...) #1 ; Function Attrs: nounwind declare token @llvm.coro.id.async(i32, i32, i32, ptr) #2 ; Function Attrs: nounwind declare ptr @llvm.coro.begin(token, ptr writeonly) #2 ; Function Attrs: nomerge nounwind declare ptr @llvm.coro.async.resume() #1 attributes #0 = { "target-features"="+adx,+aes,+avx,+avx2,+bmi,+bmi2,+clflushopt,+clwb,+clzero,+crc32,+cx16,+cx8,+f16c,+fma,+fsgsbase,+fxsr,+invpcid,+lzcnt,+mmx,+movbe,+mwaitx,+pclmul,+pku,+popcnt,+prfchw,+rdpid,+rdpru,+rdrnd,+rdseed,+sahf,+sha,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+sse4a,+ssse3,+vaes,+vpclmulqdq,+wbnoinvd,+x87,+xsave,+xsavec,+xsaveopt,+xsaves" } attributes #1 = { nomerge nounwind } attributes #2 = { nounwind } ``` This verifier crashes after the `coro-split` pass with ``` cannot guarantee tail call due to mismatched parameter counts musttail call void @af_suspend_fn(ptr poison, i64 -1, ptr poison) LLVM ERROR: Broken function PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace. Stack dump: 0. Program arguments: opt ../../../reduced.ll -O0 #0 0x00007f1d89645c0e __interceptor_backtrace.part.0 /build/gcc-11-XeT9lY/gcc-11-11.4.0/build/x86_64-linux-gnu/libsanitizer/asan/../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:4193:28 #1 0x0000556d94d254f7 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /home/ubuntu/modular/third-party/llvm-project/llvm/lib/Support/Unix/Signals.inc:723:22 #2 0x0000556d94d19a2f llvm::sys::RunSignalHandlers() /home/ubuntu/modular/third-party/llvm-project/llvm/lib/Support/Signals.cpp:105:20 #3 0x0000556d94d1aa42 SignalHandler(int) /home/ubuntu/modular/third-party/llvm-project/llvm/lib/Support/Unix/Signals.inc:371:36 llvm#4 0x00007f1d88e42520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520) llvm#5 0x00007f1d88e969fc __pthread_kill_implementation ./nptl/pthread_kill.c:44:76 llvm#6 0x00007f1d88e969fc __pthread_kill_internal ./nptl/pthread_kill.c:78:10 llvm#7 0x00007f1d88e969fc pthread_kill ./nptl/pthread_kill.c:89:10 llvm#8 0x00007f1d88e42476 gsignal ./signal/../sysdeps/posix/raise.c:27:6 llvm#9 0x00007f1d88e287f3 abort ./stdlib/abort.c:81:7 llvm#10 0x0000556d8944be01 std::vector<llvm::json::Value, std::allocator<llvm::json::Value>>::size() const /usr/include/c++/11/bits/stl_vector.h:919:40 llvm#11 0x0000556d8944be01 bool std::operator==<llvm::json::Value, std::allocator<llvm::json::Value>>(std::vector<llvm::json::Value, std::allocator<llvm::json::Value>> const&, std::vector<llvm::json::Value, std::allocator<llvm::json::Value>> const&) /usr/include/c++/11/bits/stl_vector.h:1893:23 llvm#12 0x0000556d8944be01 llvm::json::operator==(llvm::json::Array const&, llvm::json::Array const&) /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/Support/JSON.h:572:69 llvm#13 0x0000556d8944be01 llvm::json::operator==(llvm::json::Value const&, llvm::json::Value const&) (.cold) /home/ubuntu/modular/third-party/llvm-project/llvm/lib/Support/JSON.cpp:204:28 llvm#14 0x0000556d949ed2bd llvm::report_fatal_error(char const*, bool) /home/ubuntu/modular/third-party/llvm-project/llvm/lib/Support/ErrorHandling.cpp:82:70 llvm#15 0x0000556d8e37e876 llvm::SmallVectorBase<unsigned int>::size() const /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/ADT/SmallVector.h:91:32 llvm#16 0x0000556d8e37e876 llvm::SmallVectorTemplateCommon<llvm::DiagnosticInfoOptimizationBase::Argument, void>::end() /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/ADT/SmallVector.h:282:41 llvm#17 0x0000556d8e37e876 llvm::SmallVector<llvm::DiagnosticInfoOptimizationBase::Argument, 4u>::~SmallVector() /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/ADT/SmallVector.h:1215:24 llvm#18 0x0000556d8e37e876 llvm::DiagnosticInfoOptimizationBase::~DiagnosticInfoOptimizationBase() /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/IR/DiagnosticInfo.h:413:7 llvm#19 0x0000556d8e37e876 llvm::DiagnosticInfoIROptimization::~DiagnosticInfoIROptimization() /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/IR/DiagnosticInfo.h:622:7 llvm#20 0x0000556d8e37e876 llvm::OptimizationRemark::~OptimizationRemark() /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/IR/DiagnosticInfo.h:689:7 llvm#21 0x0000556d8e37e876 operator() /home/ubuntu/modular/third-party/llvm-project/llvm/lib/Transforms/Coroutines/CoroSplit.cpp:2213:14 llvm#22 0x0000556d8e37e876 emit<llvm::CoroSplitPass::run(llvm::LazyCallGraph::SCC&, llvm::CGSCCAnalysisManager&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&)::<lambda()> > /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/Analysis/OptimizationRemarkEmitter.h:83:12 llvm#23 0x0000556d8e37e876 llvm::CoroSplitPass::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) /home/ubuntu/modular/third-party/llvm-project/llvm/lib/Transforms/Coroutines/CoroSplit.cpp:2212:13 llvm#24 0x0000556d8c36ecb1 llvm::detail::PassModel<llvm::LazyCallGraph::SCC, llvm::CoroSplitPass, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/IR/PassManagerInternal.h:91:3 llvm#25 0x0000556d91c1a84f llvm::PassManager<llvm::LazyCallGraph::SCC, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) /home/ubuntu/modular/third-party/llvm-project/llvm/lib/Analysis/CGSCCPassManager.cpp:90:12 llvm#26 0x0000556d8c3690d1 llvm::detail::PassModel<llvm::LazyCallGraph::SCC, llvm::PassManager<llvm::LazyCallGraph::SCC, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/IR/PassManagerInternal.h:91:3 llvm#27 0x0000556d91c2162d llvm::ModuleToPostOrderCGSCCPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /home/ubuntu/modular/third-party/llvm-project/llvm/lib/Analysis/CGSCCPassManager.cpp:278:18 llvm#28 0x0000556d8c369035 llvm::detail::PassModel<llvm::Module, llvm::ModuleToPostOrderCGSCCPassAdaptor, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/IR/PassManagerInternal.h:91:3 llvm#29 0x0000556d9457abc5 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/IR/PassManager.h:247:20 llvm#30 0x0000556d8e30979e llvm::CoroConditionalWrapper::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /home/ubuntu/modular/third-party/llvm-project/llvm/lib/Transforms/Coroutines/CoroConditionalWrapper.cpp:19:74 llvm#31 0x0000556d8c365755 llvm::detail::PassModel<llvm::Module, llvm::CoroConditionalWrapper, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/IR/PassManagerInternal.h:91:3 llvm#32 0x0000556d9457abc5 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/IR/PassManager.h:247:20 llvm#33 0x0000556d89818556 llvm::SmallPtrSetImplBase::isSmall() const /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/ADT/SmallPtrSet.h:196:33 llvm#34 0x0000556d89818556 llvm::SmallPtrSetImplBase::~SmallPtrSetImplBase() /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/ADT/SmallPtrSet.h:84:17 llvm#35 0x0000556d89818556 llvm::SmallPtrSetImpl<llvm::AnalysisKey*>::~SmallPtrSetImpl() /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/ADT/SmallPtrSet.h:321:7 llvm#36 0x0000556d89818556 llvm::SmallPtrSet<llvm::AnalysisKey*, 2u>::~SmallPtrSet() /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/ADT/SmallPtrSet.h:427:7 llvm#37 0x0000556d89818556 llvm::PreservedAnalyses::~PreservedAnalyses() /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/IR/Analysis.h:109:7 llvm#38 0x0000556d89818556 llvm::runPassPipeline(llvm::StringRef, llvm::Module&, llvm::TargetMachine*, llvm::TargetLibraryInfoImpl*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::StringRef, llvm::ArrayRef<llvm::PassPlugin>, llvm::ArrayRef<std::function<void (llvm::PassBuilder&)>>, llvm::opt_tool::OutputKind, llvm::opt_tool::VerifierKind, bool, bool, bool, bool, bool, bool, bool) /home/ubuntu/modular/third-party/llvm-project/llvm/tools/opt/NewPMDriver.cpp:532:10 llvm#39 0x0000556d897e3939 optMain /home/ubuntu/modular/third-party/llvm-project/llvm/tools/opt/optdriver.cpp:737:27 llvm#40 0x0000556d89455461 main /home/ubuntu/modular/third-party/llvm-project/llvm/tools/opt/opt.cpp:25:33 llvm#41 0x00007f1d88e29d90 __libc_start_call_main ./csu/../sysdeps/nptl/libc_start_call_main.h:58:16 llvm#42 0x00007f1d88e29e40 call_init ./csu/../csu/libc-start.c:128:20 llvm#43 0x00007f1d88e29e40 __libc_start_main ./csu/../csu/libc-start.c:379:5 llvm#44 0x0000556d897b6335 _start (/home/ubuntu/modular/.derived/third-party/llvm-project/build-relwithdebinfo-asan/bin/opt+0x150c335) Aborted (core dumped)

...which caused issues like > ==42==ERROR: AddressSanitizer failed to deallocate 0x32 (50) bytes at address 0x117e0000 (error code: 28) > ==42==Cannot dump memory map on emscriptenAddressSanitizer: CHECK failed: sanitizer_common.cpp:81 "((0 && "unable to unmmap")) != (0)" (0x0, 0x0) (tid=288045824) > #0 0x14f73b0c in __asan::CheckUnwind()+0x14f73b0c (this.program+0x14f73b0c) > #1 0x14f8a3c2 in __sanitizer::CheckFailed(char const*, int, char const*, unsigned long long, unsigned long long)+0x14f8a3c2 (this.program+0x14f8a3c2) > #2 0x14f7d6e1 in __sanitizer::ReportMunmapFailureAndDie(void*, unsigned long, int, bool)+0x14f7d6e1 (this.program+0x14f7d6e1) > #3 0x14f81fbd in __sanitizer::UnmapOrDie(void*, unsigned long)+0x14f81fbd (this.program+0x14f81fbd) > llvm#4 0x14f875df in __sanitizer::SuppressionContext::ParseFromFile(char const*)+0x14f875df (this.program+0x14f875df) > llvm#5 0x14f74eab in __asan::InitializeSuppressions()+0x14f74eab (this.program+0x14f74eab) > llvm#6 0x14f73a1a in __asan::AsanInitInternal()+0x14f73a1a (this.program+0x14f73a1a) when trying to use an ASan suppressions file under Emscripten: Even though it would be considered OK by SUSv4, the Emscripten runtime states "We don't support partial munmapping" (see <emscripten-core/emscripten@f4115eb> "Implement MAP_ANONYMOUS on top of malloc in STANDALONE_WASM mode (llvm#16289)"). Co-authored-by: Stephan Bergmann <stephan.bergmann@allotropia.de>

…erSize (llvm#67657)" This reverts commit f0b3654. This commit triggers UB by reading an uninitialized variable. `UP.PartialThreshold` is used uninitialized in `getUnrollingPreferences()` when it is called from `LoopVectorizationPlanner::executePlan()`. In this case the `UP` variable is created on the stack and its fields are not initialized. ``` ==8802==WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x557c0b081b99 in llvm::BasicTTIImplBase<llvm::X86TTIImpl>::getUnrollingPreferences(llvm::Loop*, llvm::ScalarEvolution&, llvm::TargetTransformInfo::UnrollingPreferences&, llvm::OptimizationRemarkEmitter*) llvm-project/llvm/include/llvm/CodeGen/BasicTTIImpl.h #1 0x557c0b07a40c in llvm::TargetTransformInfo::Model<llvm::X86TTIImpl>::getUnrollingPreferences(llvm::Loop*, llvm::ScalarEvolution&, llvm::TargetTransformInfo::UnrollingPreferences&, llvm::OptimizationRemarkEmitter*) llvm-project/llvm/include/llvm/Analysis/TargetTransformInfo.h:2277:17 #2 0x557c0f5d69ee in llvm::TargetTransformInfo::getUnrollingPreferences(llvm::Loop*, llvm::ScalarEvolution&, llvm::TargetTransformInfo::UnrollingPreferences&, llvm::OptimizationRemarkEmitter*) const llvm-project/llvm/lib/Analysis/TargetTransformInfo.cpp:387:19 #3 0x557c0e6b96a0 in llvm::LoopVectorizationPlanner::executePlan(llvm::ElementCount, unsigned int, llvm::VPlan&, llvm::InnerLoopVectorizer&, llvm::DominatorTree*, bool, llvm::DenseMap<llvm::SCEV const*, llvm::Value*, llvm::DenseMapInfo<llvm::SCEV const*, void>, llvm::detail::DenseMapPair<llvm::SCEV const*, llvm::Value*>> const*) llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7624:7 llvm#4 0x557c0e6e4b63 in llvm::LoopVectorizePass::processLoop(llvm::Loop*) llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:10253:13 llvm#5 0x557c0e6f2429 in llvm::LoopVectorizePass::runImpl(llvm::Function&, llvm::ScalarEvolution&, llvm::LoopInfo&, llvm::TargetTransformInfo&, llvm::DominatorTree&, llvm::BlockFrequencyInfo*, llvm::TargetLibraryInfo*, llvm::DemandedBits&, llvm::AssumptionCache&, llvm::LoopAccessInfoManager&, llvm::OptimizationRemarkEmitter&, llvm::ProfileSummaryInfo*) llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:10344:30 llvm#6 0x557c0e6f2f97 in llvm::LoopVectorizePass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:10383:9 [...] Uninitialized value was created by an allocation of 'UP' in the stack frame #0 0x557c0e6b961e in llvm::LoopVectorizationPlanner::executePlan(llvm::ElementCount, unsigned int, llvm::VPlan&, llvm::InnerLoopVectorizer&, llvm::DominatorTree*, bool, llvm::DenseMap<llvm::SCEV const*, llvm::Value*, llvm::DenseMapInfo<llvm::SCEV const*, void>, llvm::detail::DenseMapPair<llvm::SCEV const*, llvm::Value*>> const*) llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7623:3 ```

dstutt and others added 30 commits October 10, 2019 14:25

[AMDGPU] Temporarily disabled immarg check in verifier

2635ffd

LLPC has .ll file library functions that break this verifier check. We need to fix those before re-enabling the check. Change-Id: I4c000793a805a50f1d91d1ae55c567a356e72519

[AMDGPU] Add Wave32 Support to Scratch Bounds Checking Pass

cd87f1a

Implement appropriate register allocation and exec mask usage for wave32 scenarios. Change-Id: I844018f6c07fdda46366af51654017fb4b654d8a

[AMDGCN] Support wave32/64 in waterfall intrinsics

6efb5e0

Extended test to check wave32 and gfx10 Change-Id: I620c6d9e737ddccdfd3494abf72632d7ecc4a53f

[AMDGPU] Fix getDwordOffset in buffer mem merging pass

dcf74f4

Change-Id: I3557477344135f7dbece33a2945bcb9f4063c10f

Merge MIMG instructions

dc3170a

Extend LoadStoreOptimizer to handle IMAGE_LOAD and IMAGE_SAMPLE. Change-Id: Id49c992b9781254e39e1352125f5ffb212fe4f24

Fixed "Merge MIMG instructions"

ce17a44

Fix a breakage in areMemAccessesTriviallyDisjoint, caused by trying to merge a non-load MIMG instruction IMAGE_GET_RESINFO. Only MIMG that loads from memory should be considered. Change-Id: I012435534b3e6161ee7feb686d4711097d2ab980

Reinstate "AMDGPU: Be explicit about whether the high-word in SI_PC_A…

d702c00

…DD_REL_OFFSET is 0" This reverts commit ad3ea6fe0eedc9fe48b96453cfed51f8c1dd79de.

gfx10 atomic opt test fixup

c62b84d

Change-Id: I3dddc18d5dea0fd1050633e35d0ac463a9ed26ca

Guzhu-AMD and others added 2 commits January 13, 2022 22:53

Overwritten with: 1a7b346 Merged main:4edb9983cb8c into amd-gfx:f5a46…

c20ba76

…9632404 Based on upstream llvm : 4edb998 [SelectionDAG] treat X constrained labels as i for asm Added AMD modification notices and removed some GPL files. Change-Id: I46d09041fccf532e2c31320842d3f932e0d088a2

Make conditional discard flags available externally

f002213

This is so that we can set/access these flags in other `.cpp` files.

jayfoad reviewed Jan 27, 2022

View reviewed changes

trenouf force-pushed the amd-gfx-gpuopen-dev branch from 5b3ce20 to 17549ba Compare January 25, 2023 12:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make conditional discard flags available externally #3

Make conditional discard flags available externally #3

kuhar commented Jan 26, 2022

jayfoad Jan 27, 2022

perlfu Jan 27, 2022

kuhar Jan 27, 2022

kuhar Jan 27, 2022 •

edited

Loading

Make conditional discard flags available externally #3

Are you sure you want to change the base?

Make conditional discard flags available externally #3

Conversation

kuhar commented Jan 26, 2022

jayfoad Jan 27, 2022

Choose a reason for hiding this comment

perlfu Jan 27, 2022

Choose a reason for hiding this comment

kuhar Jan 27, 2022

Choose a reason for hiding this comment

kuhar Jan 27, 2022 • edited Loading

Choose a reason for hiding this comment

kuhar Jan 27, 2022 •

edited

Loading