Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

My app experiences periodic connectivity issues with connections to Patroni pod, what should I do? #68

Closed
mitovskaol opened this issue Jan 19, 2021 · 0 comments

Comments

@mitovskaol
Copy link
Contributor

What is happening?

There have been reports on the OCP 4 Silver cluster of connectivity issues affecting the connections from other pods (e.g. backup contains, api pods) to Patroni database pods. No other types of pods seem to be affected.
The database pods seems to be running fine, but other pods are not able to connect to it. It may result in the other pods failing liveliness checks or just throwin Connection time out errors. Once the database pod was recycled - either manually or automatically after some time - the other pods were able to connect to it again.

Why is it happening?
The possible root cause of the issue has been traced to the Aporeto, Software Defined Network (SDN) Solution running in the Silver cluster. Aporeto seems to be ocassionally removing the Processing Units (PU) labels from the Patroni pods which prevents the Network Security Policies (NSP) attached to the pod from being enforced and therefore all communications to and from the pods are blocked according to the Zero-Trust Model principle.

What can I do?
We highly recommend to have health checks (readiness probes) set up for all pods connecting to Patroni database pods and make sure they are checking all dependencies required for healthy operations for a pod including connections to other pods and external databases. If a health check detects an connection issue, delete and re-created the affected database pod. Increase the number of replicas to make the app more resilient to a failure of one database pod (at minimum 3 replica pods are recommended).

What can Platform Services do?
We are working with Aporeto on releasing the fix for the removed PU labels, and expect it roll it out in Silver before the end of the week of Jan 18.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant