-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Akka-Persistence: Zombie Actor on missing recovery permission (waitingRecoveryPermit) #28658
Comments
Can you clarify in what scenario the |
It can be a bug, so if you have any more information @motmot80 that would be valuable. |
We are still having this issue. The message wasn't lost - it seems the implementation is causing this issue: When a permit is buffered in So Maybe I took a wrong turn - but I think the So it's possible to queue actor recoveries to not overload the persistence, without restarting the persistent actors several times. Thanks in advance. |
What you describe is intended, but maybe there is a bug. I’ll look into it. Let us know if you see what is wong. The code for this is rather difficult in Eventsourced due to binary compatibility limitations. |
That is how it is implemented: https://github.com/akka/akka/blob/master/akka-persistence/src/main/scala/akka/persistence/RecoveryPermitter.scala#L75 and there is a test for it: https://github.com/akka/akka/blob/master/akka-persistence/src/test/scala/akka/persistence/RecoveryPermitterSpec.scala#L81 so there must be something else |
After reading the original issue description again it seems like you would like to have a configurable timeout for how long to wait for the permit. Your system is overloaded and you would prefer to stop the actors when they have been waiting for too long for the permit. That seems like a fair request. |
We are having issues with PersistentActors staying in
waitRecoveryPermit
state in case the permit answerRecoveryPermitGranted
is lost in high load scenarios.akka/akka-persistence/src/main/scala/akka/persistence/Eventsourced.scala
Line 601 in 6fe2f66
Because every message is stashed there's no way to stop or restart the actor in this state.
In addition to the existing
max-concurrent-recoveries
there should be a setting on how long the PersistentActor should wait before closing itself (die trying).F.e. something like
max-concurrent-recoveries
-wait-timeout
.Thanks in advance and best regards
Thomas
The text was updated successfully, but these errors were encountered: