-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Akka.Remote starts listening on different port than specified #4353
Comments
I've went back in time, and checked out a repro #3953 that @Aaronontheweb was trying out the other day for #3943 . I've applied the fallback configs and the cluster was self-formed... :/ public class Bugfix3943Specs
{
public static string[] Hocons =
{
"akka : {\r\n actor : {\r\n provider : cluster }}",
"akka : {\r\n stdout-loglevel : INFO\r\n loglevel : INFO\r\n log-config-on-start : on\r\n loggers : [\"Akka.Logger.Serilog.SerilogLogger, Akka.Logger.Serilog\"] \r\nactor : {\r\n debug : {\r\n receive : on\r\n autoreceive : on\r\n lifecycle : on\r\n event-stream : on\r\n unhandled : on\r\n } } }",
"akka : {\r\n remote : {\r\n dot-netty : {\r\n tcp : {\r\n log-transport : true\r\n transport-class : \"Akka.Remote.Transport.DotNetty.TcpTransport, Akka.Remote\"\r\n transport-protocol : tcp\r\n hostname : 0.0.0.0\r\n public-hostname : localhost\r\n port : 9000\r\n } } } }",
"akka : {\r\n cluster : {\r\n log-info : on\r\n seed-nodes : [\"akka.tcp://System@localhost:9000\"]\r\n roles : []\r\n role : {}\r\n } }"
};
[Fact]
public async Task Bugfix3943_should_start_ActorSystem()
{
var config = ConfigurationFactory.ParseString("");
foreach (var hocon in Hocons)
{
config = config.WithFallback(global::Akka.Configuration.ConfigurationFactory.ParseString(hocon));
}
var system = ActorSystem.Create("System", config);
var tcs = new TaskCompletionSource<Done>();
Cluster.Get(system).RegisterOnMemberUp(() =>
{
tcs.SetResult(Done.Instance);
});
var cts = new CancellationTokenSource();
cts.Token.Register(tcs.SetCanceled);
cts.CancelAfter(TimeSpan.FromSeconds(5));
await tcs.Task;
}
} |
The behavior seems to be gone after I've terminated a process that was blocking one of the default ports.... , I'll reopen if the issue persists |
Thanks! Let us know if you run into any more trouble. |
@Aaronontheweb I'm actually working on reproducing this again :D |
What’s changed? |
@Aaronontheweb ReproSpec.cs using System;
using System.Threading;
using System.Threading.Tasks;
using Akka.Actor;
using Akka.Configuration;
using Xunit;
namespace Akka.Cluster.Tests
{
public class ReproSpec
{
public static string[] Hocons =
{
"akka : {\r\n actor : {\r\n provider : cluster }}",
"akka : {\r\n stdout-loglevel : INFO\r\n loglevel : INFO\r\n log-config-on-start : on\r\n loggers : [\"Akka.Logger.Serilog.SerilogLogger, Akka.Logger.Serilog\"] \r\nactor : {\r\n debug : {\r\n receive : on\r\n autoreceive : on\r\n lifecycle : on\r\n event-stream : on\r\n unhandled : on\r\n } } }",
"akka : {\r\n remote : {\r\n dot-netty : {\r\n tcp : {\r\n log-transport : true\r\n transport-class : \"Akka.Remote.Transport.DotNetty.TcpTransport, Akka.Remote\"\r\n transport-protocol : tcp\r\n hostname : 0.0.0.0\r\n public-hostname : core-cluster-svc-0.core-cluster-svc\r\n port : 5000\r\n } } } }",
"akka : {\r\n cluster : {\r\n log-info : on\r\n seed-nodes : [\"akka.tcp://System@core-cluster-svc-0.core-cluster-svc:5000\",\"akka.tcp://System@core-cluster-svc-1.core-cluster-svc:5000\",\"akka.tcp://System@core-cluster-svc-2.core-cluster-svc:5000\"]\r\n roles : [seed]\r\n role : { } } }"
};
[Fact()]
public async Task Should_start_ActorSystem()
{
var config = ConfigurationFactory.ParseString("");
foreach (var hocon in Hocons)
{
config = config.WithFallback(global::Akka.Configuration.ConfigurationFactory.ParseString(hocon));
}
var system = ActorSystem.Create("System", config);
var tcs = new TaskCompletionSource<Done>();
Cluster.Get(system).RegisterOnMemberUp(() =>
{
tcs.SetResult(Done.Instance);
});
var cts = new CancellationTokenSource();
cts.Token.Register(tcs.SetCanceled);
cts.CancelAfter(TimeSpan.FromSeconds(5));
await tcs.Task;
}
}
} Upon execution, a I'm wondering what the issue is, the fallback responsible for configuring akka.remote is:
Correct me if I'm wrong (just in case) but, in HOCON terms:
should be equivalent to:
Or perhaps my approach to composing the config (from multiple fallbacks) is completely bonkers? I'm wondering what causes this :/ |
@Aaronontheweb I've done some more debugging: RemoteActorRefProvider.cs public virtual void Init(ActorSystemImpl system)
{
_system = system;
_local.Init(system); // <-- NullReferenceException
_actorRefResolveThreadLocalCache = ActorRefResolveThreadLocalCache.For(system);
_actorPathThreadLocalCache = ActorPathThreadLocalCache.For(system);
_remotingTerminator =
_system.SystemActorOf(
RemoteSettings.ConfigureDispatcher(Props.Create(() => new RemotingTerminator(_local.SystemGuardian))),
"remoting-terminator");
_internals = CreateInternals(); // <-- This is where the requested Internals are being initialized
_remotingTerminator.Tell(RemoteInternals);
Transport.Start();
_remoteWatcher = CreateRemoteWatcher(system);
_remoteDeploymentWatcher = CreateRemoteDeploymentWatcher(system);
} Race condition? |
Got an NRE inside the |
Ok, so why is it trying to get the internals before they've been initialized? |
If it's a handled akka.net/src/core/Akka/Serialization/Serialization.cs Lines 114 to 123 in b62842e
|
I should note - I haven't looked at the logs yet, but @Arkatufus is taking a look at this issue. |
From your logs, looks like the system is binding to Although the socket bind error is from Petabridge.Cmd |
Checking |
hmmm yes, just noticed that too, |
Could be related - the logging system has to grab the transport info at startup. |
I tried your debugging code and I can't quite reproduce the bug, it did fail because it couldn't connect to the seed because the seed is missing. |
@Arkatufus the HOCON & logs were collected from 1 of 3 pods running in a k8s StatefulSet, specifically from
What do you think is missing? |
This is what I got when I tried to run your code above: This is the code I use to try and reproduce the bug: So the error came from one of the k8 seed node? |
Yes, actually the entire deployment is failing at the moment
|
Ok, let me read the logs, thanks. |
It is failing at this line: Can you try adding |
sure, let me give it a try |
encountered some tech issues with my stack, I'll get back to you after I get this to run under the setting you've specified. The line you've indicated makes sense, but also seems like an interesting observation concerning: https://getakka.net/articles/configuration/akka.cluster.html The observation would be, that having :
is not allowed - correct? |
It should be allowed, I'm trying to find out if it is a bug in hocon, I was hoping that by providing a value, it can be a temporary bandaid until I can find out the real culprit. |
Found an interesting thing This is the HOCON that's being parsed using
Can you repro that on your end? |
Common culprit might be the |
just noticed that compared to the docs, I seem to be missing a retesting |
Yep, that was it
retesting |
Its supposed to be |
ETA 10m for my CI to process the cfg change so I can redeploy it. In the mean time, I really think I should write a MNTK test with this config to verify its correctness. @Aaronontheweb is there a sample I can reference for that? |
Well, I tried to run the line of code that was failing inside |
AFAIK I'm not explicitly deleting anything from the configuration |
The deployment crashed again, log output similiar to the old one: Pod stdout log
The akka.cluster HOCON piece was:
|
ok, I'll try to reproduce it, I'll post my findings here. |
I still can't reproduce it. Are you sure all of the pods are using the same configuration? |
Yes, all pods have the same config, only the public hostname is varying from pod to pod. In your HOCON:
The 1st seed node refers to a different actor system than the other 2, is that intentional? |
I've run a successful test with docker-compose The HOCON appears to by syntaxically correct
|
@mmisztal1980 so adding the min-nr-of-members fixed the issue? |
it worked in docker-compose but I'm still seeing crashes in k8s (NRE), I'm trying to determine the difference |
@mmisztal1980 please check #4358, I can't guarantee a 100% fix, but I think I fixed the major ones |
* Add bug spec * Missing keys during merge * Remove weird workaround that doesn't make any sense * How root are calculated neeed to be reversed * Original node value need to be preserved * use Config.Root to standardize Config access * Possible fix for "collection was modified" exception during merge * Minor optimization
@mmisztal1980 try tonight's nightly, 1.4.3-beta637207629595381843 - that has this fix included. |
on it |
@Aaronontheweb initial result is negative, I'll do some more testing tomorrow Stacktrace
|
@Aaronontheweb @Arkatufus I'm being cautiously optimistic right now, I'm going to retest with the kubectl logs
|
Looks like we're in business :)
kubectl logs core-cluster-svc-0
kubectl logs core-cluster-svc-1
kubectl logs core-cluster-svc-2
Healthcheck
|
So is there still a bug on our end? |
The fix by @Arkatufus seems to have helped, so I'm more than happy to close the issue. What I'm wondering about though, are the NRE stack-traces I've posted above. Specifically I'm wondering about Stacktrace
|
…kkadotnet#4358)" This reverts commit 3788ced.
Hi guys,
I've returned to one of my pet projects (lighthouse alternative), after some time off. In the project I have a bootstrapper, that generates HOCON fallbacks, from which the configuration is being built.
Upon startup, I'm noticing that:
Remote.conf
(2552)HOCON, pseudocode and log outputs located here
The text was updated successfully, but these errors were encountered: