You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When ClusterSharding_should_work_in_single_node_cluster() is executed, and first node is joining cluster, it is calling CreateCoordinator() which sometimes failes during CoordinatorProps call like this:
[Node #2(first)][Node2:first][FAIL] Akka.Cluster.Sharding.Tests.DDataClusterShardingSpec.ClusterSharding_specs
[Node #2(first)][Node2:first][FAIL-EXCEPTION] Type: System.NullReferenceException
[Node #2(first)]-->[Node2:first][FAIL-EXCEPTION] Message: Object reference not set to an instance of an object.[Node #2(first)]-->[Node2:first][FAIL-EXCEPTION] StackTrace: at Hocon.Config.WithFallback(Config fallback)[Node #2(first)] at Akka.Cluster.Sharding.Tests.ClusterShardingSpec.CoordinatorProps(String typeName, Boolean rebalanceEntities, Boolean rememberEntities)in D:\a\1\s\src\contrib\cluster\Akka.Cluster.Sharding.Tests.MultiNode\ClusterShardingSpec.cs:line 486[Node #2(first)] at Akka.Cluster.Sharding.Tests.ClusterShardingSpec.CreateCoordinator()in D:\a\1\s\src\contrib\cluster\Akka.Cluster.Sharding.Tests.MultiNode\ClusterShardingSpec.cs:line 467[Node #2(first)] at Akka.Cluster.Sharding.Tests.ClusterShardingSpec.Join(RoleName from, RoleName to)in D:\a\1\s\src\contrib\cluster\Akka.Cluster.Sharding.Tests.MultiNode\ClusterShardingSpec.cs:line 446[Node #2(first)] at Akka.Cluster.Sharding.Tests.ClusterShardingSpec.<ClusterSharding_should_work_in_single_node_cluster>b__28_0()in D:\a\1\s\src\contrib\cluster\Akka.Cluster.Sharding.Tests.MultiNode\ClusterShardingSpec.cs:line 587[Node #2(first)] at Akka.TestKit.TestKitBase.<>c__DisplayClass150_0.<Within>b__0()in D:\a\1\s\src\core\Akka.TestKit\TestKitBase_Within.cs:line 57[Node #2(first)] at Akka.TestKit.TestKitBase.Within[T](TimeSpan min, TimeSpan max, Func`1 function, String hint, Nullable`1 epsilonValue)in D:\a\1\s\src\core\Akka.TestKit\TestKitBase_Within.cs:line 134[Node #2(first)] at Akka.TestKit.TestKitBase.Within(TimeSpan min, TimeSpan max, Action action, String hint, Nullable`1 epsilonValue)in D:\a\1\s\src\core\Akka.TestKit\TestKitBase_Within.cs:line 57[Node #2(first)] at Akka.TestKit.TestKitBase.Within(TimeSpan max, Action action, Nullable`1 epsilonValue)in D:\a\1\s\src\core\Akka.TestKit\TestKitBase_Within.cs:line 32[Node #2(first)] at Akka.Cluster.Sharding.Tests.ClusterShardingSpec.ClusterSharding_should_work_in_single_node_cluster()in D:\a\1\s\src\contrib\cluster\Akka.Cluster.Sharding.Tests.MultiNode\ClusterShardingSpec.cs:line 585[Node #2(first)] at Akka.Cluster.Sharding.Tests.ClusterShardingSpec.ClusterSharding_specs()in D:\a\1\s\src\contrib\cluster\Akka.Cluster.Sharding.Tests.MultiNode\ClusterShardingSpec.cs:line 529
Sometimes little bit later with
[Node #6(fifth)]Cause:[akka://DDataClusterShardingSpec/user/rebalancingCounterCoordinator#1327172867]: Akka.Actor.ActorInitializationException: Exception during creation ---> System.TypeLoadException: Error while creating actor instance of type Akka.Cluster.Tools.Singleton.ClusterSingletonManager with 3 args: (Akka.Actor.Props,<PoisonPill>,Akka.Cluster.Tools.Singleton.ClusterSingletonManagerSettings) ---> System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. ---> Hocon.ConfigurationException: min-number-of-hand-over-retries must be >= 1[Node #6(fifth)] at Akka.Cluster.Tools.Singleton.ClusterSingletonManager..ctor(Props singletonProps, Object terminationMessage, ClusterSingletonManagerSettings settings)in D:\a\1\s\src\contrib\cluster\Akka.Cluster.Tools\Singleton\ClusterSingletonManager.cs:line 562[Node #6(fifth)]--- End of inner exception stack trace ---[Node #6(fifth)] at System.RuntimeMethodHandle.InvokeMethod(Object target, Object[] arguments, Signature sig, Boolean constructor, Boolean wrapExceptions)[Node #6(fifth)] at System.Reflection.RuntimeConstructorInfo.Invoke(BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)[Node #6(fifth)] at System.RuntimeType.CreateInstanceImpl(BindingFlags bindingAttr, Binder binder, Object[] args, CultureInfo culture, Object[] activationAttributes)[Node #6(fifth)] at Akka.Actor.Props.ActivatorProducer.Produce()in D:\a\1\s\src\core\Akka\Actor\Props.cs:line 639[Node #6(fifth)] at Akka.Actor.Props.NewActor()in D:\a\1\s\src\core\Akka\Actor\Props.cs:line 575[Node #6(fifth)]--- End of inner exception stack trace ---[Node #6(fifth)] at Akka.Actor.Props.NewActor()in D:\a\1\s\src\core\Akka\Actor\Props.cs:line 577[Node #6(fifth)] at Akka.Actor.ActorCell.CreateNewActorInstance()in D:\a\1\s\src\core\Akka\Actor\ActorCell.cs:line 351[Node #6(fifth)] at Akka.Actor.ActorCell.<>c__DisplayClass117_0.<NewActor>b__0()in D:\a\1\s\src\core\Akka\Actor\ActorCell.cs:line 336[Node #6(fifth)] at Akka.Actor.ActorCell.UseThreadContext(Action action)in D:\a\1\s\src\core\Akka\Actor\ActorCell.cs:line 375[Node #6(fifth)] at Akka.Actor.ActorCell.NewActor()in D:\a\1\s\src\core\Akka\Actor\ActorCell.cs:line 342[Node #6(fifth)] at Akka.Actor.ActorCell.Create(Exception failure)in D:\a\1\s\src\core\Akka\Actor\ActorCell.DefaultMessages.cs:line 422[Node #6(fifth)]--- End of inner exception stack trace ---[Node #6(fifth)] at Akka.Actor.ActorCell.Create(Exception failure)in D:\a\1\s\src\core\Akka\Actor\ActorCell.DefaultMessages.cs:line 439[Node #6(fifth)] at Akka.Actor.ActorCell.SysMsgInvokeAll(EarliestFirstSystemMessageList messages, Int32 currentState)in D:\a\1\s\src\core\Akka\Actor\ActorCell.DefaultMessages.cs:line 256
So it is just trying to get one or another setting from Settings.Config and fails.
Need to check is this a root issue (maybe some HOCON issues) - if so, this may change once we will move to standalone HOCON library.
If this is just a result of some inner exception and settings are cleanup due to another failure (is it possible? They should be immutable, right?), need to understand what's going under the cover.
Here is a log part of what happens when first node is trying to join cluster and create ClusterSingletoneManager:
[Node #2(first)][INFO][1/23/202010:19:56 PM][Thread 0018][Cluster (akka://DDataClusterShardingSpec)] Cluster Node [akka.tcp://DDataClusterShardingSpec@localhost:1752] - Node [akka.tcp://DDataClusterShardingSpec@localhost:1752] is JOINING itself (with roles [backend]) and forming a new cluster[Node #2(first)][INFO][1/23/202010:19:56 PM][Thread 0018][Cluster (akka://DDataClusterShardingSpec)] Cluster Node [akka.tcp://DDataClusterShardingSpec@localhost:1752] - Leader is moving node [akka.tcp://DDataClusterShardingSpec@localhost:1752] to [Up][Node #2(first)][INFO][1/23/202010:19:57 PM][Thread 0019][akka.tcp://DDataClusterShardingSpec@localhost:1752/user/TestCounterCoordinator] Singleton manager started singleton actor [akka://DDataClusterShardingSpec/user/TestCounterCoordinator/singleton] [Node #2(first)][INFO][1/23/202010:19:57 PM][Thread 0019][akka.tcp://DDataClusterShardingSpec@localhost:1752/user/TestCounterCoordinator] ClusterSingletonManager state change [Start -> Oldest] Akka.Cluster.Tools.Singleton.Uninitialized[Node #2(first)]---------------DISPOSING--------------------[Node #2(first)][INFO][1/23/202010:19:58 PM][Thread 0019][akka.tcp://DDataClusterShardingSpec@localhost:1752/user/TestConductorClient] Terminating connection to multi-node test controller due to [Akka.Actor.FSMBase+Shutdown][Node #2(first)][INFO][1/23/202010:19:58 PM][Thread 0032][PlayerHandler (akka://DDataClusterShardingSpec)] Client: disconnecting [::1]:1758 from [::1]:4711[Node #2(first)][WARNING][1/23/202010:19:58 PM][Thread 0018][akka://DDataClusterShardingSpec/user/rebalancingCounterCoordinator] DeadLetter from [akka://DDataClusterShardingSpec/system/cluster/$b#301104707] to [akka://DDataClusterShardingSpec/user/rebalancingCounterCoordinator#1586511960]: <Received dead letter from [akka://DDataClusterShardingSpec/system/cluster/$b#301104707]: Akka.Cluster.Tools.Singleton.StartOldestChangedBuffer>[Node #2(first)][WARNING][1/23/202010:19:58 PM][Thread 0018][akka://DDataClusterShardingSpec/user/TestCounterCoordinator/singleton/coordinator] DeadLetter from [akka://DDataClusterShardingSpec/deadLetters] to [akka://DDataClusterShardingSpec/user/TestCounterCoordinator/singleton/coordinator#2037937872]: <Received dead letter from [akka://DDataClusterShardingSpec/deadLetters]: Akka.Cluster.Sharding.PersistentShardCoordinator+StateInitialized>[Node #1(controller)][ERROR][1/23/202010:19:58 PM][Thread 0016][akka://DDataClusterShardingSpec/user/controller/barriers] unannounced disconnect of RoleName(first)[Node #1(controller)]Cause: Akka.Remote.TestKit.BarrierCoordinator+ClientLostException: unannounced disconnect of RoleName(first)[Node #1(controller)] at Akka.Remote.TestKit.BarrierCoordinator.<>c__DisplayClass14_0.<InitFSM>b__5(ClientDisconnected disconnected)in D:\a\1\s\src\core\Akka.Remote.TestKit\BarrierCoordinator.cs:line 523[Node #1(controller)] at Akka.Case.With[TMessage](Action`1 action)in D:\a\1\s\src\core\Akka\PatternMatch.cs:line 107[Node #1(controller)] at Akka.Remote.TestKit.BarrierCoordinator.<InitFSM>b__14_0(Event`1 event)in D:\a\1\s\src\core\Akka.Remote.TestKit\BarrierCoordinator.cs:line 528[Node #1(controller)] at Akka.Actor.FSM`2.<>c__DisplayClass52_0.<OrElse>b__0(Event`1 event)in D:\a\1\s\src\core\Akka\Actor\FSM.cs:line 1101[Node #1(controller)] at Akka.Actor.FSM`2.ProcessEvent(Event`1 fsmEvent, Object source)in D:\a\1\s\src\core\Akka\Actor\FSM.cs:line 1213[Node #1(controller)] at Akka.Actor.FSM`2.Receive(Object message)in D:\a\1\s\src\core\Akka\Actor\FSM.cs:line 1115[Node #1(controller)] at Akka.Actor.ActorBase.AroundReceive(Receive receive, Object message)in D:\a\1\s\src\core\Akka\Actor\ActorBase.cs:line 158[Node #1(controller)] at Akka.Actor.ActorCell.ReceiveMessage(Object message)in D:\a\1\s\src\core\Akka\Actor\ActorCell.DefaultMessages.cs:line 177[Node #1(controller)] at Akka.Actor.ActorCell.Invoke(Envelope envelope)in D:\a\1\s\src\core\Akka\Actor\ActorCell.DefaultMessages.cs:line 83[Node #1(controller)][Akka.Remote.TestKit.MsgEncoder][Debug][1/23/202010:19:58 PM]Encoding Akka.Remote.TestKit.BarrierResult
[Node #1(controller)][Akka.Remote.TestKit.Proto.ProtobufEncoder][Debug][1/23/202010:19:58 PM][[::1]:4711-->[::1]:1763] Encoding {"barrier":{"name":"first-joined", "op": "Failed" } } into Protobuf
[Node #1(controller)][Akka.Remote.TestKit.MsgEncoder][Debug][1/23/202010:19:58 PM]Encoding Akka.Remote.TestKit.BarrierResult
[Node #1(controller)][Akka.Remote.TestKit.Proto.ProtobufEncoder][Debug][1/23/202010:19:58 PM][[::1]:4711-->[::1]:1765] Encoding {"barrier":{"name":"first-joined","op":"Failed"}} into Protobuf
[Node #1(controller)][Akka.Remote.TestKit.MsgEncoder][Debug][1/23/202010:19:58 PM]Encoding Akka.Remote.TestKit.BarrierResult
[Node #1(controller)][Akka.Remote.TestKit.Proto.ProtobufEncoder][Debug][1/23/202010:19:58 PM][[::1]:4711-->[::1]:1761] Encoding {"barrier":{"name":"first-joined","op":"Failed"}} into Protobuf
So there is some error, but not clear what happened, and why RoleName(first) is announced to be disconnected.
This is part of #3786, but dedicated to ClusterShardingSpec (Akka.Cluster.Sharding.Tests.MultiNode)
This test does not seem to be time sensitive - at least first failures are not related to failed timeouts.
Actually, most failures happen here:
akka.net/src/contrib/cluster/Akka.Cluster.Sharding.Tests.MultiNode/ClusterShardingSpec.cs
Lines 455 to 481 in cb91d7f
When
ClusterSharding_should_work_in_single_node_cluster()
is executed, and first node is joining cluster, it is callingCreateCoordinator()
which sometimes failes duringCoordinatorProps
call like this:Sometimes little bit later with
So it is just trying to get one or another setting from
Settings.Config
and fails.Need to check is this a root issue (maybe some HOCON issues) - if so, this may change once we will move to standalone HOCON library.
If this is just a result of some inner exception and settings are cleanup due to another failure (is it possible? They should be immutable, right?), need to understand what's going under the cover.
Here is a log part of what happens when first node is trying to join cluster and create
ClusterSingletoneManager
:So there is some error, but not clear what happened, and why RoleName(first) is announced to be disconnected.
Attaching full log for details:
ClusterShardingSpec_fail_log.txt
The text was updated successfully, but these errors were encountered: