Replies: 1 comment 1 reply
-
Maybe when the Vault client returns an error response (other than "not found") we can call |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
We're seeing an issue in Concourse v5.7.2 with Vault. The particular scenario is wherein there are Vault services fronted by a load balancer and the backend has a primary (active) cluster and a secondary (standby) cluster.
When the primary Vault cluster experiences an outage, the load balancer does a failover to the secondary standby cluster. However, the secondary cluster is not a performance cluster and does not handle any client’s read/write requests, and so any request gets an error response (400 code) from the secondary Vault server.
The secondary cluster needs to be manually promoted so that it will be able to handle requests. So for short-lived outages, the manual promotion won't be done at all because the primary cluster would already be available again after a short amount of time.
The problem is that Concourse appears to keep on communicating with the secondary Vault server even though the primary Vault server already came back up.
It seems that Concourse re-use established connections hence it's getting stuck with the secondary Vault server in this case.
As a workaround, the web nodes are being restarted every time this situation happens. After the restart, Concourse connects to the primary Vault server and everything is fine.
Looking for a way to permanently fix this issue.
Beta Was this translation helpful? Give feedback.
All reactions