Skip to content

compiler: fix races in link queue #24171

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 14, 2025
Merged

Conversation

mlugg
Copy link
Member

@mlugg mlugg commented Jun 13, 2025

I messed up atomic orderings on this variable because they changed in a local refactor at some point. We need to always release on the store and acquire on the loads so that a linker thread observing .ready sees the stored MIR.

I messed up atomic orderings on this variable because they changed in a
local refactor at some point. We need to always release on the store and
acquire on the loads so that a linker thread observing `.ready` sees the
stored MIR.
@mlugg mlugg enabled auto-merge (rebase) June 13, 2025 18:08
@mlugg
Copy link
Member Author

mlugg commented Jun 13, 2025

GitHub in its infinite wisdom seems to have incinerated the corresponding logs, but: there's a very remote possibility that this caused the x86_64-linux-debug-llvm CI failure in https://github.com/ziglang/zig/runs/44014228031. It would've had to be LLVM reordering stuff (despite Debug mode), since x86 has acqrel semantics for all loads and stores, and we would have needed some very unlucky thread scheduling, but it's... literally the only way I could see that failure happening.

EDIT: nope, that was happening because of a really obvious bug which I somehow missed. Thanks @jacobly0 for picking up on it -- fix pushed to this PR.

@jacobly0
Copy link
Member

jacobly0 commented Jun 13, 2025

I did some sleuthing and unincinerated the log:

2025-06-13T01:26:55.1382433Z + stage3-debug/bin/zig build test docs --maxrss 21000000000 -Dlldb=/home/ci/deps/lldb-zig/Debug-e0a42bb34/bin/lldb -fqemu -fwasmtime -Dstatic-llvm -Dskip-freebsd -Dskip-netbsd -Dskip-windows -Dskip-macos -Dtarget=native-native-musl --search-prefix /home/ci/deps/zig+llvm+lld+clang-x86_64-linux-musl-0.15.0-dev.233+7c85dc460 --zig-lib-dir /home/ci/actions-runner1/_work/zig/zig/build-debug-llvm/../lib -Denable-superhtml
2025-06-13T01:32:03.8840387Z test
2025-06-13T01:32:03.8840866Z +- test-cases
2025-06-13T01:32:03.8841375Z    +- run safety.exact division failure - vectors
2025-06-13T01:32:03.8842241Z       +- compile exe safety.exact division failure - vectors Debug native failure
2025-06-13T01:32:03.8843255Z error: thread 929707 panic: reached unreachable code
2025-06-13T01:32:03.8844315Z /home/ci/actions-runner1/_work/zig/zig/lib/std/debug.zig:548:14: 0x5a3c9fd in assert (zig)
2025-06-13T01:32:03.8845291Z     if (!ok) unreachable; // assertion failure
2025-06-13T01:32:03.8845868Z              ^
2025-06-13T01:32:03.8846724Z /home/ci/actions-runner1/_work/zig/zig/src/link/Queue.zig:129:11: 0x62504ae in mirReady (zig)
2025-06-13T01:32:03.8862971Z     assert(mir.status.load(.monotonic) != .pending);
2025-06-13T01:32:03.8863596Z           ^
2025-06-13T01:32:03.8864459Z /home/ci/actions-runner1/_work/zig/zig/src/Zcu/PerThread.zig:4403:38: 0x5eebb1b in runCodegen (zig)
2025-06-13T01:32:03.8865625Z     zcu.comp.link_task_queue.mirReady(zcu.comp, out);
2025-06-13T01:32:03.8866935Z                                      ^
2025-06-13T01:32:03.8867946Z /home/ci/actions-runner1/_work/zig/zig/src/Compilation.zig:5367:18: 0x624dc94 in workerZcuCodegen (zig)
2025-06-13T01:32:03.8869052Z     pt.runCodegen(func_index, &air, out);
2025-06-13T01:32:03.8869606Z                  ^
2025-06-13T01:32:03.8870326Z /home/ci/actions-runner1/_work/zig/zig/lib/std/Thread/Pool.zig:180:50: 0x624ddb2 in runFn (zig)
2025-06-13T01:32:03.8871276Z             @call(.auto, func, .{id.?} ++ closure.arguments);
2025-06-13T01:32:03.8871857Z                                                  ^
2025-06-13T01:32:03.8872890Z /home/ci/actions-runner1/_work/zig/zig/lib/std/Thread/Pool.zig:293:27: 0x61c99f7 in worker (zig)
2025-06-13T01:32:03.8873901Z             runnable.runFn(runnable, id);
2025-06-13T01:32:03.8874426Z                           ^
2025-06-13T01:32:03.8875352Z /home/ci/actions-runner1/_work/zig/zig/lib/std/Thread.zig:510:13: 0x5e78e5d in callFn__anon_196858 (zig)
2025-06-13T01:32:03.8876382Z             @call(.auto, f, args);
2025-06-13T01:32:03.8876873Z             ^
2025-06-13T01:32:03.8877627Z /home/ci/actions-runner1/_work/zig/zig/lib/std/Thread.zig:782:30: 0x5cf99c4 in entryFn (zig)
2025-06-13T01:32:03.8878798Z                 return callFn(f, args_ptr.*);
2025-06-13T01:32:03.8879398Z                              ^
2025-06-13T01:32:03.8880419Z lib/libc/musl/src/thread/pthread_create.c:207:17: 0xeb2af3f in start (lib/libc/musl/src/thread/pthread_create.c)
2025-06-13T01:32:03.8881998Z lib/libc/musl/src/thread/x86_64/clone.s:23:0: 0xeb2c34f in ??? (lib/libc/musl/src/thread/x86_64/clone.s)
2025-06-13T01:32:03.8883175Z Unwind error at address `exe:0xeb2c34f` (error.MissingFDE), trace may be incomplete
2025-06-13T01:32:03.8883876Z 
2025-06-13T01:32:03.8883889Z 
2025-06-13T01:32:03.8884161Z error: the following command terminated unexpectedly:
2025-06-13T01:32:03.9005964Z /home/ci/actions-runner1/_work/zig/zig/build-debug-llvm/stage3-debug/bin/zig build-exe -fno-llvm -fno-lld -ODebug -Mroot=/home/ci/actions-runner1/_work/zig/zig/zig-local-cache/o/d14253cb9361aec75981f588c4887bdc/tmp.zig --cache-dir /home/ci/actions-runner1/_work/zig/zig/zig-local-cache --global-cache-dir /home/ci/actions-runner1/_work/

Did you know that allocators reuse addresses? If not, then don't feel
bad, because apparently I don't either! This dumb mistake was probably
responsible for the CI failures on `master` yesterday.
@mlugg mlugg disabled auto-merge June 13, 2025 21:06
@mlugg mlugg changed the title compiler: fix atomic orderings compiler: fix races in link queue Jun 13, 2025
@mlugg mlugg enabled auto-merge June 13, 2025 21:10
@mlugg mlugg merged commit 095c956 into ziglang:master Jun 14, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants