You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem?
Ciliumendpoints object update seems to put some scalability limitations. Each CEP today requires at least 1 CREATE and 1 UPDATE request to k8s api-server. Given that all CEPs are watched by all Cilium-agents, this translates to 2*#Nodes watch event per CEP even if they are completely static (no updates after creation). In an environment with lots of nodes and some amount of pod churn, this alone can overload api-server.
With our test on GKE, for 1k node with a pod churn rate of 30, we observe for each CEP, 1 CREATE and 2 UPDATE requests (one that updates the status to wait-for-identity, the other to ready). The watch event is estimated to be roughly 90k/s in this case, which would overwhelm a n1-standard-96 master.
Describe the solution you'd like
We'd like to trim down the CEP updates by removing the status subresource completely and only create CEP when all required info is gathered (IP, identity, etc.). This should allow us to scale better in short-term.
The text was updated successfully, but these errors were encountered:
his was discussed both on slack with @aanm and @tgraf , as well as proposed in 03/01/2021 dev meeting (there's a bit more context in the meeting notes too).
Proposal / RFE
Is your feature request related to a problem?
Ciliumendpoints object update seems to put some scalability limitations. Each CEP today requires at least 1 CREATE and 1 UPDATE request to k8s api-server. Given that all CEPs are watched by all Cilium-agents, this translates to
2*#Nodes
watch event per CEP even if they are completely static (no updates after creation). In an environment with lots of nodes and some amount of pod churn, this alone can overload api-server.With our test on GKE, for 1k node with a pod churn rate of 30, we observe for each CEP, 1 CREATE and 2 UPDATE requests (one that updates the status to
wait-for-identity
, the other toready
). The watch event is estimated to be roughly 90k/s in this case, which would overwhelm an1-standard-96
master.Describe the solution you'd like
We'd like to trim down the CEP updates by removing the status subresource completely and only create CEP when all required info is gathered (IP, identity, etc.). This should allow us to scale better in short-term.
The text was updated successfully, but these errors were encountered: