You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When running with Backtrace installed, I get the following:
Received signal 11. Backtrace:
2023-04-23T07:46:26+0000 info TestRunCluster : cluster/node=sact://TestRunCluster@127.0.0.1:7337 [DistributedCluster] ClusterSystem [TestRunCluster] initialized, listening on: sact://TestRunCluster@127.0.0.1:7337: _ActorRef<ClusterShell.Message>(/system/cluster)
2023-04-23T07:46:26+0000 info TestRunCluster : cluster/node=sact://TestRunCluster@127.0.0.1:7337 [DistributedCluster] Setting in effect: .autoLeaderElection: LeadershipSelectionSettings(underlying: DistributedCluster.ClusterSystemSettings.LeadershipSelectionSettings.(unknown context at $aaaad5a3b1dc)._LeadershipSelectionSettings.lowestReachable(minNumberOfMembers: 2))
2023-04-23T07:46:26+0000 info TestRunCluster : cluster/node=sact://TestRunCluster@127.0.0.1:7337 [DistributedCluster] Setting in effect: .downingStrategy: DowningStrategySettings(underlying: DistributedCluster.DowningStrategySettings.(unknown context at $aaaad5a3979c)._DowningStrategySettings.timeout(DistributedCluster.TimeoutBasedDowningStrategySettings(downUnreachableMembersAfter: 1.0 seconds)))
2023-04-23T07:46:26+0000 info TestRunCluster : cluster/node=sact://TestRunCluster@127.0.0.1:7337 [DistributedCluster] Setting in effect: .onDownAction: OnDownActionStrategySettings(underlying: DistributedCluster.OnDownActionStrategySettings.(unknown context at $aaaad5a3971c)._OnDownActionStrategySettings.gracefulShutdown(delay: 3.0 seconds))
2023-04-23T07:46:26+0000 info TestRunCluster : actor/path=/system/cluster cluster/node=sact://TestRunCluster@127.0.0.1:7337 [DistributedCluster] Binding to: [sact://TestRunCluster@127.0.0.1:7337]
2023-04-23T07:46:26+0000 info TestRunCluster : actor/path=/system/cluster/leadership cluster/node=sact://TestRunCluster@127.0.0.1:7337 leadership/election=DistributedCluster.Leadership.LowestReachableMember [DistributedCluster] Not enough members [1/2] to run election, members: [Member(sact://TestRunCluster:2481186327279040895@127.0.0.1:7337, status: joining, reachability: reachable)]
2023-04-23T07:46:26+0000 info TestRunCluster : actor/path=/system/cluster cluster/node=sact://TestRunCluster@127.0.0.1:7337 [DistributedCluster] Bound to [IPv4]127.0.0.1/127.0.0.1:7337
With the backtrace sending a signal 11, I tried using AddressSanitizer to see if I could get more information, which ended up giving me:
AddressSanitizer:DEADLYSIGNAL
=================================================================
==1==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x000000000000 bp 0xffff819de570 sp 0xffff819de560 T3)
==1==Hint: pc points to the zero page.
==1==The signal is caused by a READ memory access.
==1==Hint: address points to the zero page.
2023-04-23T18:49:29+0000 info TestRunCluster : cluster/node=sact://TestRunCluster@127.0.0.1:7337 [DistributedCluster] ClusterSystem [TestRunCluster] initialized, listening on: sact://TestRunCluster@127.0.0.1:7337: _ActorRef<ClusterShell.Message>(/system/cluster)#0 0x0 (<unknown module>)#1 0xaaaac2c62014 (/CrashingCluster+0x1e82014)#2 0xaaaac2c62754 (/CrashingCluster+0x1e82754)#3 0xaaaac2c2008c (/CrashingCluster+0x1e4008c)#4 0xaaaac2c1fdf4 (/CrashingCluster+0x1e3fdf4)#5 0xaaaac2c2c098 (/CrashingCluster+0x1e4c098)#6 0xffff85f7d5c4 (/lib/aarch64-linux-gnu/libc.so.6+0x7d5c4) (BuildId: f37f3aa07c797e333fd106472898d361f71798f5)#7 0xffff85fe5d18 (/lib/aarch64-linux-gnu/libc.so.6+0xe5d18) (BuildId: f37f3aa07c797e333fd106472898d361f71798f5)
AddressSanitizer can not provide additional info.
2023-04-23T18:49:29+0000 info TestRunCluster : cluster/node=sact://TestRunCluster@127.0.0.1:7337 [DistributedCluster] Setting in effect: .autoLeaderElection: LeadershipSelectionSettings(underlying: DistributedCluster.ClusterSystemSettings.LeadershipSelectionSettings.(unknown context at $aaaac374b1dc)._LeadershipSelectionSettings.lowestReachable(minNumberOfMembers: 2))
SUMMARY: AddressSanitizer: SEGV (<unknown module>)
Thread T3 created by T1 here:
#0 0xaaaac149fb68 (/CrashingCluster+0x6bfb68)#1 0xaaaac2c28478 (/CrashingCluster+0x1e48478)#2 0xaaaac2c2b694 (/CrashingCluster+0x1e4b694)#3 0xaaaac2c24c04 (/CrashingCluster+0x1e44c04)#4 0xaaaac2c2c098 (/CrashingCluster+0x1e4c098)#5 0xffff85f7d5c4 (/lib/aarch64-linux-gnu/libc.so.6+0x7d5c4) (BuildId: f37f3aa07c797e333fd106472898d361f71798f5)#6 0xffff85fe5d18 (/lib/aarch64-linux-gnu/libc.so.6+0xe5d18) (BuildId: f37f3aa07c797e333fd106472898d361f71798f5)
Thread T1 created by T0 here:
2023-04-23T18:49:29+0000 info TestRunCluster : cluster/node=sact://TestRunCluster@127.0.0.1:7337 [DistributedCluster] Setting in effect: .downingStrategy: DowningStrategySettings(underlying: DistributedCluster.DowningStrategySettings.(unknown context at $aaaac374979c)._DowningStrategySettings.timeout(DistributedCluster.TimeoutBasedDowningStrategySettings(downUnreachableMembersAfter: 1.0 seconds)))
#0 0xaaaac149fb68 (/CrashingCluster+0x6bfb68)#1 0xaaaac2c28478 (/CrashingCluster+0x1e48478)#2 0xaaaac2c634cc (/CrashingCluster+0x1e834cc)#3 0xaaaac2c6293c (/CrashingCluster+0x1e8293c)#4 0xaaaac2c62014 (/CrashingCluster+0x1e82014)#5 0xaaaac2c62754 (/CrashingCluster+0x1e82754)#6 0xaaaac18ce5b4 (/CrashingCluster+0xaee5b4)#7 0xffff85f273f8 (/lib/aarch64-linux-gnu/libc.so.6+0x273f8) (BuildId: f37f3aa07c797e333fd106472898d361f71798f5)#8 0xffff85f274c8 (/lib/aarch64-linux-gnu/libc.so.6+0x274c8) (BuildId: f37f3aa07c797e333fd106472898d361f71798f5)#9 0xaaaac143efac (/CrashingCluster+0x65efac)
2023-04-23T18:49:29+0000 info TestRunCluster : cluster/node=sact://TestRunCluster@127.0.0.1:7337 [DistributedCluster] Setting in effect: .onDownAction: OnDownActionStrategySettings(underlying: DistributedCluster.OnDownActionStrategySettings.(unknown context at $aaaac374971c)._OnDownActionStrategySettings.gracefulShutdown(delay: 3.0 seconds))
==1==ABORTING
As far as I can tell, the problem only seems to arise when running on Linux with this Dockerfile:
# ================================# Build image# ================================FROM swift:5.8-jammy as builder
RUN mkdir /workspace
WORKDIR /workspace
COPY . /workspace
RUN swift build --sanitize=address -c release -Xswiftc -g --static-swift-stdlib
# ================================# Run image# ================================FROM ubuntu:jammy
COPY --from=builder /workspace/.build/release/CrashingCluster /
EXPOSE 7337
ENTRYPOINT ["./CrashingCluster"]
This reproduction, along with the Dockerfile, can be found in this repo, if it helps.
Thanks for all the work on this!
The text was updated successfully, but these errors were encountered:
We continued looking into this and strongly suspect that this is a bug with Swift 5.8 and --static-swift-stdlib together with asan (address sanitizer).
I'll quadruple check some more but that's our strong suspicion so far.
It also does not reproduce on Swift 5.9 and we suspect this might be a fix for it: apple/swift#65254
Hello!
Ran into an issue when running the service in a release configuration on Linux via docker.
After some digging, I believe I've isolated the issue to when the cluster is initialized. I have a reproduction of the issue with a simple main.swift:
When running with Backtrace installed, I get the following:
With the backtrace sending a signal 11, I tried using AddressSanitizer to see if I could get more information, which ended up giving me:
As far as I can tell, the problem only seems to arise when running on Linux with this Dockerfile:
This reproduction, along with the Dockerfile, can be found in this repo, if it helps.
Thanks for all the work on this!
The text was updated successfully, but these errors were encountered: