-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatic rate limiting of endpoint API calls #13319
Conversation
4768bd3
to
15ee965
Compare
test-me-please |
test-me-please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some small nits while skimming through the implementation. Great documentation - in godoc as well as the Cilium docs!
Though a few things around the estimated duration I was wondering about and didn't find an answer:
- How were the estimated process durations determined?
- How should a potential future user of this API estimate the duration? Might make sense to add instructions to the docs?
- What happens if the estimation is off significantly?
daemon/cmd/endpoint.go
Outdated
@@ -137,11 +148,19 @@ func NewGetEndpointIDHandler(d *Daemon) GetEndpointIDHandler { | |||
func (h *getEndpointID) Handle(params GetEndpointIDParams) middleware.Responder { | |||
log.WithField(logfields.EndpointID, params.ID).Debug("GET /endpoint/{id} request") | |||
|
|||
r, err := h.d.apiLimiterSet.Wait(params.HTTPRequest.Context(), apiRequestEndpointGet) | |||
if err != nil { | |||
return api.Error(responseTooManyRequests, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tgraf IIUC there are primarily two paths for a failed API request: 1. the request exceeds the maximum wait time, 2. the request exceeds the rate limiting settings. The latter will return immediately with a return code of responseTooManyRequests (429), whereas the former will return with another failure.
My questions, as a user are: does Cilium somehow update the status on the pod to reflect the cause of the error? Secondly, since rate limiting is intended as a means to reduce pressure on the cilium agent, what is the recommended course of action for the user upon receiving responseTooManyRequests? Again, my understanding is that the user has two options: either to retry after a short amount of time, or to adjust the rate-limiting configuration and then retry immediately. My issue with the second approach is that the end user--the one who initiated the pod creation--typically won't have the ability to make cluster wide changes. Hence, my question is, what is the mode for interrogation and the passing of these failures and settings across the multiple components (end-user, k8s operator, cilium)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does Cilium somehow update the status on the pod to reflect the cause of the error?
The error will surface in kubectl describe pod
and kubectl get events
so you will see that CNI creation failed because of too many requests. It should give you better visibility than before instead of kubelet/runtime timing out and retrying silently.
Secondly, since rate limiting is intended as a means to reduce pressure on the cilium agent, what is the recommended course of action for the user upon receiving responseTooManyRequests?
In general it should resolve itself automatically as kubelet will continue to retry. It will avoid the dangerous CNI ADD / CNI DEL cycle after which some resources have already been invested and then have to be undone.
Again, my understanding is that the user has two options: either to retry after a short amount of time,
kubelet will take care of this. The difference will be that it is done with visibility and it fails upfront without an investment of CPU resources already committed.
or to adjust the rate-limiting configuration and then retry immediately.
Adjustment of rate-limiting configuration should only be needed if the automatic adjustment doesn't allow for the required number of pod creations (or other API calls) even if low processing durations are observed. If enough resources are available, the default rate limits will automatically be relaxed further. An example where things could not work out well automatically: if an API call is estimated to take 2 seconds with 4 parallel requests but it only takes 0.5 seconds and you really need 8 parallel requests at all times. The auto adjustment won't allow for this and manual interaction is required to adjust the limits.
Another example is if the endpoint creation is much slower than expected, let's say it takes 10s instead of 2s. Then only one parallel endpoint request will be allowed as ParallelRequests will be shrunk down automatically.
Based on the metrics, it should be possible to tell immediately whether automatically adjusted limits are reasonable or not. We will be happy to optimize with you.
My issue with the second approach is that the end user--the one who initiated the pod creation--typically won't have the ability to make cluster wide changes. Hence, my question is, what is the mode for interrogation and the passing of these failures and settings across the multiple components (end-user, k8s operator, cilium)?
Agreed. The user should not really see this. Similar to how a k8s user typically also doesn't see if the k8s apiserver returns 429. It does. Retries are done automatically. If too many retries are needed, the platform operator has to step in and resize the VM, adjust the limits, and so on. It will be the same with these limits. Observation should be done via the metrics so you see the mean waiting duration, if that gets out of hand, you can check why it is the case and determine whether to adjust the limits or fix resource constraints.
ed1a400
to
dc7beb0
Compare
test-me-please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! A few minor comments.
The API rate limiting system is capable to enforce both rate limiting and maximum parallel requests. Instead of enforcing static limits, the system is capable to automatically adjust rate limits and allowed parallel requests by comparing the provided estimated processing duration with the mean processing duration observed. Usage: ``` var requestLimiter = rate.NewAPILimiter("myRequest", rate.APILimiterParameters{ rate.SkipInitial: 5, rate.RateLimit: 1.0, // 1 request/s rate.ParallelRequests: 2, }, nil) func myRequestHandler() error { req, err := requestLimiter.Wait(context.Background()) if err != nil { // request timed out whie waiting return err } defer req.Done() // Signal that request has been processed // process request .... return nil } ``` Configuration parameters: - EstimatedProcessingDuration time.Duration EstimatedProcessingDuration is the estimated duration an API call will take. This value is used if AutoAdjust is enabled to automatically adjust rate limits to stay as close as possible to the estimated processing duration. - AutoAdjust bool AutoAdjust enables automatic adjustment of the values ParallelRequests, RateLimit, and RateBurst in order to keep the mean processing duration close to EstimatedProcessingDuration - MeanOver int MeanOver is the number of entries to keep in order to calculate the mean processing and wait duration - ParallelRequests int ParallelRequests is the parallel requests allowed. If AutoAdjust is enabled, the value will adjust automatically. - MaxParallelRequests int MaxParallelRequests is the maximum parallel requests allowed. If AutoAdjust is enabled, then the ParalelRequests will never grow above MaxParallelRequests. - MinParallelRequests int MinParallelRequests is the minimum parallel requests allowed. If AutoAdjust is enabled, then the ParallelRequests will never fall below MinParallelRequests. - RateLimit rate.Limit RateLimit is the initial number of API requests allowed per second. If AutoAdjust is enabled, the value will adjust automatically. - RateBurst int RateBurst is the initial allowed burst of API requests allowed. If AutoAdjust is enabled, the value will adjust automatically. - MinWaitDuration time.Duration MinWaitDuration is the minimum time an API request always has to wait before the Wait() function returns an error. - MaxWaitDuration time.Duration MaxWaitDuration is the maximum time an API request is allowed to wait before the Wait() function returns an error. - Log bool Log enables info logging of processed API requests. This should only be used for low frequency API calls. Example: ``` level="info" msg="Processing API request with rate limiter" maxWaitDuration=10ms name=foo parallelRequests=2 subsys=rate uuid=933267c5-01db-11eb-93bb-08002720ea43 level="info" msg="API call has been processed" name=foo processingDuration=10.020573ms subsys=rate totalDuration=10.047051ms uuid=933265c7-01db-11eb-93bb-08002720ea43 waitDurationTotal="18.665µs" level=warning msg="Not processing API request. Wait duration for maximum parallel requests exceeds maximum" maxWaitDuration=10ms maxWaitDurationParallel=10ms name=foo parallelRequests=2 subsys=rate uuid=933269d2-01db-11eb-93bb-08002720ea43 ``` - DelayedAdjustmentFactor float64 DelayedAdjustmentFactor is percentage of the AdjustmentFactor to be applied to RateBurst and MaxWaitDuration defined as a value between 0.0..1.0. This is used to steer a slower reaction of the RateBurst and ParallelRequests compared to RateLimit. - SkipInitial int SkipInitial is the number of API calls to skip before applying rate limiting. This is useful to define a learning phase in the beginning to allow for auto adjustment before imposing wait durations on API calls. - MaxAdjustmentFactor float64 MaxAdjustmentFactor is the maximum adjustment factor when AutoAdjust is enabled. Base values will not adjust more than by this factor. The configuration of API rate limiters is typically provided as-code to establish defaults. A string based configuration option can then be used to adjust defaults. This allows to expose configuration of rate limiting using a single option flag: ``` go l, err = NewAPILimiterSet(map[string]string{ "foo": "rate-limit:2/m,rate-burst:2", }, map[string]APILimiterParameters{ "foo": { RateLimit: rate.Limit(1.0 / 60.0), AutoAdjust: true, }, }, nil) ``` Signed-off-by: Thomas Graf <thomas@cilium.io>
Add default rate limiting for all endpoint related API calls with automatic adjustment based on estimated processing duration. Metrics are provided to monitor the rate limiting system: ``` cilium_api_limiter_adjustment_factor api_call="endpoint-create" 0.695787 cilium_api_limiter_processed_requests_total api_call="endpoint-create" outcome="success" 7.000000 cilium_api_limiter_processing_duration_seconds api_call="endpoint-create" value="estimated" 2.000000 cilium_api_limiter_processing_duration_seconds api_call="endpoint-create" value="mean" 2.874443 cilium_api_limiter_rate_limit api_call="endpoint-create" value="burst" 4.000000 cilium_api_limiter_rate_limit api_call="endpoint-create" value="limit" 0.347894 cilium_api_limiter_requests_in_flight api_call="endpoint-create" value="in-flight" 0.000000 cilium_api_limiter_requests_in_flight api_call="endpoint-create" value="limit" 0.000000 cilium_api_limiter_wait_duration_seconds api_call="endpoint-create" value="max" 15.000000 cilium_api_limiter_wait_duration_seconds api_call="endpoint-create" value="mean" 0.000000 cilium_api_limiter_wait_duration_seconds api_call="endpoint-create" value="min" 0.000000 ``` Signed-off-by: Thomas Graf <thomas@cilium.io>
6e384e1
to
c598865
Compare
Changes made:
|
c598865
to
0906012
Compare
Signed-off-by: Thomas Graf <thomas@cilium.io>
0906012
to
ec0f695
Compare
test-me-please |
Summary
When Cilium is subject to CPU or memory constraints, it is possible for Cilium to receive too many API calls which results in required work. Eventually, Cilium may never get out of the situation as the backpressure is insufficient. Several parallelization efforts are controller by the number of available cores on the system which can lead to wrong assumptions if Cilium is constrained to 1-2 cores.
API Rate Limiting
The API rate limiting system is capable to enforce both rate limiting
and maximum parallel requests. Instead of enforcing static limits, the
system is capable to automatically adjust rate limits and allowed
parallel requests by comparing the provided estimated processing
duration with the mean processing duration observed.
Usage:
Configuration parameters:
EstimatedProcessingDuration time.Duration
EstimatedProcessingDuration is the estimated duration an API call
will take. This value is used if AutoAdjust is enabled to
automatically adjust rate limits to stay as close as possible to the
estimated processing duration.
AutoAdjust bool
AutoAdjust enables automatic adjustment of the values
ParallelRequests, RateLimit, and RateBurst in order to keep the mean
processing duration close to EstimatedProcessingDuration
MeanOver int
MeanOver is the number of entries to keep in order to calculate the
mean processing and wait duration
ParallelRequests int
ParallelRequests is the parallel requests allowed. If AutoAdjust is
enabled, the value will adjust automatically.
MaxParallelRequests int
MaxParallelRequests is the maximum parallel requests allowed. If
AutoAdjust is enabled, then the ParalelRequests will never grow above
MaxParallelRequests.
MinParallelRequests int
MinParallelRequests is the minimum parallel requests allowed. If
AutoAdjust is enabled, then the ParallelRequests will never fall
below MinParallelRequests.
RateLimit rate.Limit
RateLimit is the initial number of API requests allowed per second.
If AutoAdjust is enabled, the value will adjust automatically.
RateBurst int
RateBurst is the initial allowed burst of API requests allowed. If
AutoAdjust is enabled, the value will adjust automatically.
MinWaitDuration time.Duration
MinWaitDuration is the minimum time an API request always has to wait
before the Wait() function returns an error.
MaxWaitDuration time.Duration
MaxWaitDuration is the maximum time an API request is allowed to wait
before the Wait() function returns an error.
Log bool
Log enables info logging of processed API requests. This should only
be used for low frequency API calls.
Example:
DelayedAdjustmentFactor float64
DelayedAdjustmentFactor is percentage of the AdjustmentFactor to be
applied to RateBurst and MaxWaitDuration defined as a value between
0.0..1.0. This is used to steer a slower reaction of the RateBurst
and ParallelRequests compared to RateLimit.
SkipInitial int
SkipInitial is the number of API calls to skip before applying rate
limiting. This is useful to define a learning phase in the beginning
to allow for auto adjustment before imposing wait durations on API
calls.
MaxAdjustmentFactor float64
MaxAdjustmentFactor is the maximum adjustment factor when AutoAdjust
is enabled. Base values will not adjust more than by this factor.
The configuration of API rate limiters is typically provided as-code to
establish defaults. A string based configuration option can then be used
to adjust defaults. This allows to expose configuration of rate limiting
using a single option flag:
New Default API Rate Limits