-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sidecar connection issues with placement service with on-disk raft logs #7749
Comments
/assign |
I am preparing to use dapr_placement.cluster.forceInMemoryLog=true in a production environment. I haven't enabled high availability (HA) before. This makes me very concerned. |
I updated the title and description to reflect that we actually use on-disk raft logs. I saw the dapr flag default being in memory so assumed that's what we were using (see here); however, we actually use the helm value default which sets placement to on-disk for raft logs. So, the issue in my case is specific to on-disk raft logs. ping @Cherrs - just wanted to let you know the update :) |
Isn't this Helm default ( |
Yeah, that's correct, we can only have on disk log when we have a placement cluster:
|
In what area(s)?
/area placement
What version of Dapr?
latest
Relates to: #4881 and #4882 and #5583
Expected Behavior
Sidecar can connect to placement service without issue, and have built in fault tolerance with connection issues as they arise with Placement. A stable connection between sidecar and placement service is necessary for workflows to function properly.
Actual Behavior
Sidecar disconnects, has issues reconnecting, and then is able to connect with placement service. This is a single instance of placement with on-disk logs for raft.
abbreviated logs from sidecar:
I also saw logs from the sidecar showing
only leader can serve the request
when I'm running a single instance, so maybe somehow placement lost it's leadership lease even though there is only one instance, and there is no other instance to pick up the lease so the sidecar gets stuck.However, the interesting thing here that to fix the problem, I had to restart the sidecar and recycle the TCP connection to placement. Then the sidecar could connect properly.
Another interesting one here is that I do see this repeating on another sidecar every 2 hours where something sends a termination signal to placement, then it shuts down, and it is then that the sidecar has issues connecting. This makes sense as to the sidecar having connection problems as the placement service is down and spinning back up, but it is unclear to me why or what sends termination signals to placement. This also occurs after I see in placement:
Steps to Reproduce the Problem
I'm not even sure why it is occuring tbh, so this is going to be challenging to reproduce. More exploration is needed to answer here better.
Release Note
RELEASE NOTE:
FIX Bug in placement with in-mem Raft logs preventing proper connection with sidecar.
The text was updated successfully, but these errors were encountered: