-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add async startup #5436
add async startup #5436
Conversation
Breaking changes in here for sure break compat with Phobos but we'll need to test to see if this resolves the regression in 1.4.29 |
If there are conflicts with phobos, we can resolve them later on |
Yep, it's not a deal breaker. Fixing clustering is the bigger priority. |
@Aaronontheweb Why did the test failed? |
Part of the test suite locked up and timed out after 30 minutes, which means we don't get the report indicating which test failed. And wherever it failed, it did it consistently. Need to click through to the build log to see it: Looks like it's the Akka.FSharp specs |
Yes, i saw this too but don't understand why it failed and how to resolve it. |
Looks like the remoting system deadlocked - see all of the |
I think i found it, its something in RemoteActorRefProvider.CreateInternals() |
one other major problem is the logging. One other suspicious thing is the terminator routine, I think in the last test-run (one before the last commit) there was an NRE on startup and the start-finalizer deathlocked |
I think this is great progress - just need to keep whittling down the issues the test suite raises |
Is your issue related to this? #4424 |
It's still an issue with remoting. If I had to guess, the problem here is that the |
{ | ||
var extensions = LoadExtensions(); | ||
foreach (var init in extensions.OfType<IInitializable>()) | ||
await init.InitializeAsync(cancellationToken); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is part of your problem - the old system had ordering in-place created as an effect of lazy loading. If Cluster
depends on the RARP
extension, Cluster
would start RARP
if it wasn't created already and would load the cached value.
Scanning the assembly and arbitrarily front-loading the extensions is non-deterministic and will result in the types of deadlocks you're seeing now.
I wouldn't front-load any of the extensions at all - lazy loading until they're needed on-demand is still the better approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, one simple fix here to see if it resolves the deadlock - don't await
each individual plugin starting up. Compose them into a Task.WhenAll
and await
on the group.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only Cluster and the RemoteActorRefProvider are supporting it,
yes the issue is WHO and WHEN the Cluster Extensions get cold and exit
|
||
// This is effectively a write-once variable similar to a lazy val. The reason for not using a lazy val is exception | ||
// handling. | ||
private volatile HashSet<Address> _addresses; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why replace this with a Volatile.Write
down below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its a multi read and only single write field.
volatile as perf issues if something is accessing it multiple times
_provider.Init(this); | ||
if (_provider is IInitializable init) | ||
await init.InitializeAsync(cancellationToken); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would we call Init
and InitializeAsync
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IInitializable
is a feature faced, it does not replace the old API
@@ -302,13 +337,13 @@ private void StopScheduler() | |||
sched?.Dispose(); | |||
} | |||
|
|||
private void LoadExtensions() | |||
private List<object> LoadExtensions() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we type this usefully in order to avoid boxing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This are only the extensions loaded from config, excluding lazy loaded extensions
It was simple the requirement that the The whole extensions system is none-deterministic. Changing the Dispatcher can change the extension loading order. |
I can not reproduce the lock localy. Something with the UnitTest dispatcher and most likely the Extesions-Lazy<>.Value is locking somewhere/somehow. I mark this one as WIP |
I think it'd be worth writing up a spec on this first and reworking the problem from the top down that way |
If I could reproduce the behavior then we could write a spec for it. I think that I found now the issue the |
@Aaronontheweb Would need help,
I removed basically all from the The call sequence didn't change and still there is some death lock, The secquence is I could even remove the The only idea what i have left is inside
that is just simply death-locking instantly are the answer-mailbox is not processed |
@Aaronontheweb I moved the fixes out to a new PR and open a discussion for the startup here: |
No description provided.