-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Actor autoscaling with Keda doesn't work as expected #4768
Comments
@fabistb Please add any missing information |
It's normal for an actor to not be present for a short period of time as it gets rebalanced, and this should be a retriable error that resolves once the tables are updated. Can you confirm that this error occurs after retries that happen after scaling has finished? |
It's gone after our retry mechanism kicks in. Do I understand that correct that Dapr will also retry automatically after rebalancing? |
Dapr will only retry calls if the error is a transient network error or an authentication error from the target sidecar. It will not retry if it's a missing actor error, so retrying here is up to your app. |
Can this be done via resiliency feature? /cc @halspang |
In this case, resiliency doesn't cover it. Resiliency for actor state operations is handled at the component level. We could always bump it up a little bit to add retries around actor discovery if we think something like this is likely to be transient. |
Actor instances not found can be treated as transient IMO. |
@halspang , @artursouza , @yaron2 , fyi. Thank you very much for looking into this. We tried dapr 1.8.0-rc.3 and 1.8.0-rc.4 and with our tests and we are unfortunately still able to reproduce this actor instance is missing exception. Can we provide you some log or something like this to get this sorted out? |
In what area(s)?
/area runtime
What version of Dapr?
1.7.2
Expected Behavior
If I use actor scaling, then actors should not fail.
Actual Behavior
Actor state API returns error "actor instance is missing" if actor scaling is enabled.
Detailed description about our findings
We have an actor that executes a process based on multiple tasks. After each task we call the state store API to store the actual process state. The reason for that is that we use a journal pattern which helps us to retrigger the process without executing already successful processed steps. This means that we call the state store many times during the actor lifetime.
For this actor we have enabled actor scaling with Keda. During load testing we have seen a lot of "actor instance is missing" errors. After looking through the code we have seen that this error occurs only in the actor state API when no actor can be found in the actor table.
Store state:
dapr/pkg/http/api.go
Line 1489 in bbc1abc
Get state:
dapr/pkg/http/api.go
Line 1648 in bbc1abc
This means that a running actor stores its state and the state API says that the actor is not in the table. This should normally not happen. So we searched for places where the actor is deleted from the table. We found two places:
Actor deactivation:
dapr/pkg/actors/actors.go
Line 278 in bbc1abc
Actor rebalancing:
dapr/pkg/actors/actors.go
Line 653 in bbc1abc
For rebalancing we have found the
drainRebalancedActors
configuration that we have tried to disable. But this doesn't work. The issue occurs. Then we have tried to disable our automatic scaling by settingminReplicaCount
andmaxReplicaCount
to the same number. After testing again, the issue was gone.Steps to Reproduce the Problem
Release Note
RELEASE NOTE: FIX Actor actor instance missing error during scaling issue
The text was updated successfully, but these errors were encountered: