Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic rate limiting of endpoint API calls #13319

Merged
merged 3 commits into from
Oct 2, 2020

Conversation

tgraf
Copy link
Member

@tgraf tgraf commented Sep 29, 2020

Summary

When Cilium is subject to CPU or memory constraints, it is possible for Cilium to receive too many API calls which results in required work. Eventually, Cilium may never get out of the situation as the backpressure is insufficient. Several parallelization efforts are controller by the number of available cores on the system which can lead to wrong assumptions if Cilium is constrained to 1-2 cores.

API Rate Limiting

The API rate limiting system is capable to enforce both rate limiting
and maximum parallel requests. Instead of enforcing static limits, the
system is capable to automatically adjust rate limits and allowed
parallel requests by comparing the provided estimated processing
duration with the mean processing duration observed.

Usage:

var requestLimiter = rate.NewAPILimiter("myRequest", rate.APILimiterParameters{
	rate.SkipInitial:      5,
	rate.RateLimit:        1.0, // 1 request/s
	rate.ParallelRequests: 2,
}, nil)

func myRequestHandler() error {
	req, err := requestLimiter.Wait(context.Background())
	if err != nil {
		// request timed out whie waiting
		return err
	}
	defer req.Done() // Signal that request has been processed

	// process request ....

	return nil
}

Configuration parameters:

  • EstimatedProcessingDuration time.Duration

    EstimatedProcessingDuration is the estimated duration an API call
    will take. This value is used if AutoAdjust is enabled to
    automatically adjust rate limits to stay as close as possible to the
    estimated processing duration.

  • AutoAdjust bool

    AutoAdjust enables automatic adjustment of the values
    ParallelRequests, RateLimit, and RateBurst in order to keep the mean
    processing duration close to EstimatedProcessingDuration

  • MeanOver int

    MeanOver is the number of entries to keep in order to calculate the
    mean processing and wait duration

  • ParallelRequests int

    ParallelRequests is the parallel requests allowed. If AutoAdjust is
    enabled, the value will adjust automatically.

  • MaxParallelRequests int

    MaxParallelRequests is the maximum parallel requests allowed. If
    AutoAdjust is enabled, then the ParalelRequests will never grow above
    MaxParallelRequests.

  • MinParallelRequests int

    MinParallelRequests is the minimum parallel requests allowed. If
    AutoAdjust is enabled, then the ParallelRequests will never fall
    below MinParallelRequests.

  • RateLimit rate.Limit

    RateLimit is the initial number of API requests allowed per second.
    If AutoAdjust is enabled, the value will adjust automatically.

  • RateBurst int

    RateBurst is the initial allowed burst of API requests allowed. If
    AutoAdjust is enabled, the value will adjust automatically.

  • MinWaitDuration time.Duration

    MinWaitDuration is the minimum time an API request always has to wait
    before the Wait() function returns an error.

  • MaxWaitDuration time.Duration

    MaxWaitDuration is the maximum time an API request is allowed to wait
    before the Wait() function returns an error.

  • Log bool

    Log enables info logging of processed API requests. This should only
    be used for low frequency API calls.

    Example:

    level="info" msg="Processing API request with rate limiter" maxWaitDuration=10ms name=foo parallelRequests=2 subsys=rate uuid=933267c5-01db-11eb-93bb-08002720ea43
    level="info" msg="API call has been processed" name=foo processingDuration=10.020573ms subsys=rate totalDuration=10.047051ms uuid=933265c7-01db-11eb-93bb-08002720ea43 waitDurationTotal="18.665µs"
    level=warning msg="Not processing API request. Wait duration for maximum parallel requests exceeds maximum" maxWaitDuration=10ms maxWaitDurationParallel=10ms name=foo parallelRequests=2 subsys=rate uuid=933269d2-01db-11eb-93bb-08002720ea43
    
  • DelayedAdjustmentFactor float64

    DelayedAdjustmentFactor is percentage of the AdjustmentFactor to be
    applied to RateBurst and MaxWaitDuration defined as a value between
    0.0..1.0. This is used to steer a slower reaction of the RateBurst
    and ParallelRequests compared to RateLimit.

  • SkipInitial int

    SkipInitial is the number of API calls to skip before applying rate
    limiting. This is useful to define a learning phase in the beginning
    to allow for auto adjustment before imposing wait durations on API
    calls.

  • MaxAdjustmentFactor float64

    MaxAdjustmentFactor is the maximum adjustment factor when AutoAdjust
    is enabled. Base values will not adjust more than by this factor.

The configuration of API rate limiters is typically provided as-code to
establish defaults. A string based configuration option can then be used
to adjust defaults. This allows to expose configuration of rate limiting
using a single option flag:

l, err = NewAPILimiterSet(map[string]string{
	"foo": "rate-limit:2/m,rate-burst:2",
}, map[string]APILimiterParameters{
	"foo": {
		RateLimit:  rate.Limit(1.0 / 60.0),
		AutoAdjust: true,
	},
}, nil)

New Default API Rate Limits

API Call Limit Burst Max Parallel Min Parallel Max Wait Duration Auto Adjust Estimated Processing Duration
PUT /endpoint/{id} 0.5/s 4 4   15s True 2s
DELETE /endpoint/{id}     4 4   True 200ms
GET /endpoint/{id}/* 4/s 4 4 2 10s True 200ms
PATCH /endpoint/{id}* 0.5/s 4 4   15s True 1s
GET /endpoint 1/s 4 2 2   True 300ms

@tgraf tgraf added kind/bug This is a bug in the Cilium logic. release-note/minor This PR changes functionality that users may find relevant to operating Cilium. labels Sep 29, 2020
@tgraf tgraf requested a review from a team September 29, 2020 08:54
@tgraf tgraf requested review from a team as code owners September 29, 2020 08:54
@tgraf tgraf marked this pull request as draft September 29, 2020 08:54
@tgraf tgraf force-pushed the pr/tgraf/endpoint-create-rate-limit branch from 4768bd3 to 15ee965 Compare September 29, 2020 09:42
@tgraf
Copy link
Member Author

tgraf commented Sep 29, 2020

test-me-please

@tgraf tgraf marked this pull request as ready for review September 29, 2020 19:50
@tgraf tgraf requested a review from a team as a code owner September 29, 2020 19:50
@tgraf
Copy link
Member Author

tgraf commented Sep 29, 2020

test-me-please

@tgraf tgraf added priority/high This is considered vital to an upcoming release. priority/release-blocker labels Sep 29, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from master in 1.8.4 Sep 29, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from master in 1.7.10 Sep 29, 2020
pkg/rate/api_limiter.go Outdated Show resolved Hide resolved
Copy link
Member

@tklauser tklauser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small nits while skimming through the implementation. Great documentation - in godoc as well as the Cilium docs!

Though a few things around the estimated duration I was wondering about and didn't find an answer:

  • How were the estimated process durations determined?
  • How should a potential future user of this API estimate the duration? Might make sense to add instructions to the docs?
  • What happens if the estimation is off significantly?

daemon/cmd/api_limits.go Outdated Show resolved Hide resolved
pkg/rate/api_limiter.go Show resolved Hide resolved
@@ -137,11 +148,19 @@ func NewGetEndpointIDHandler(d *Daemon) GetEndpointIDHandler {
func (h *getEndpointID) Handle(params GetEndpointIDParams) middleware.Responder {
log.WithField(logfields.EndpointID, params.ID).Debug("GET /endpoint/{id} request")

r, err := h.d.apiLimiterSet.Wait(params.HTTPRequest.Context(), apiRequestEndpointGet)
if err != nil {
return api.Error(responseTooManyRequests, err)
Copy link

@PinchasLev PinchasLev Sep 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tgraf IIUC there are primarily two paths for a failed API request: 1. the request exceeds the maximum wait time, 2. the request exceeds the rate limiting settings. The latter will return immediately with a return code of responseTooManyRequests (429), whereas the former will return with another failure.

My questions, as a user are: does Cilium somehow update the status on the pod to reflect the cause of the error? Secondly, since rate limiting is intended as a means to reduce pressure on the cilium agent, what is the recommended course of action for the user upon receiving responseTooManyRequests? Again, my understanding is that the user has two options: either to retry after a short amount of time, or to adjust the rate-limiting configuration and then retry immediately. My issue with the second approach is that the end user--the one who initiated the pod creation--typically won't have the ability to make cluster wide changes. Hence, my question is, what is the mode for interrogation and the passing of these failures and settings across the multiple components (end-user, k8s operator, cilium)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does Cilium somehow update the status on the pod to reflect the cause of the error?

The error will surface in kubectl describe pod and kubectl get events so you will see that CNI creation failed because of too many requests. It should give you better visibility than before instead of kubelet/runtime timing out and retrying silently.

Secondly, since rate limiting is intended as a means to reduce pressure on the cilium agent, what is the recommended course of action for the user upon receiving responseTooManyRequests?

In general it should resolve itself automatically as kubelet will continue to retry. It will avoid the dangerous CNI ADD / CNI DEL cycle after which some resources have already been invested and then have to be undone.

Again, my understanding is that the user has two options: either to retry after a short amount of time,

kubelet will take care of this. The difference will be that it is done with visibility and it fails upfront without an investment of CPU resources already committed.

or to adjust the rate-limiting configuration and then retry immediately.

Adjustment of rate-limiting configuration should only be needed if the automatic adjustment doesn't allow for the required number of pod creations (or other API calls) even if low processing durations are observed. If enough resources are available, the default rate limits will automatically be relaxed further. An example where things could not work out well automatically: if an API call is estimated to take 2 seconds with 4 parallel requests but it only takes 0.5 seconds and you really need 8 parallel requests at all times. The auto adjustment won't allow for this and manual interaction is required to adjust the limits.

Another example is if the endpoint creation is much slower than expected, let's say it takes 10s instead of 2s. Then only one parallel endpoint request will be allowed as ParallelRequests will be shrunk down automatically.

Based on the metrics, it should be possible to tell immediately whether automatically adjusted limits are reasonable or not. We will be happy to optimize with you.

My issue with the second approach is that the end user--the one who initiated the pod creation--typically won't have the ability to make cluster wide changes. Hence, my question is, what is the mode for interrogation and the passing of these failures and settings across the multiple components (end-user, k8s operator, cilium)?

Agreed. The user should not really see this. Similar to how a k8s user typically also doesn't see if the k8s apiserver returns 429. It does. Retries are done automatically. If too many retries are needed, the platform operator has to step in and resize the VM, adjust the limits, and so on. It will be the same with these limits. Observation should be done via the metrics so you see the mean waiting duration, if that gets out of hand, you can check why it is the case and determine whether to adjust the limits or fix resource constraints.

@tgraf tgraf force-pushed the pr/tgraf/endpoint-create-rate-limit branch from ed1a400 to dc7beb0 Compare September 30, 2020 14:59
@tgraf tgraf changed the title API rate limiting Automatic rate limiting of endpoint API calls Sep 30, 2020
@tgraf
Copy link
Member Author

tgraf commented Sep 30, 2020

test-me-please

@joestringer joestringer removed this from Needs backport from master in 1.8.4 Sep 30, 2020
@joestringer joestringer removed this from Needs backport from master in 1.7.10 Sep 30, 2020
@joestringer joestringer added this to Needs backport from master in 1.7.11 Sep 30, 2020
@joestringer joestringer added this to Needs backport from master in 1.8.5 Sep 30, 2020
Copy link
Member

@christarazi christarazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! A few minor comments.

pkg/rate/api_limiter.go Outdated Show resolved Hide resolved
pkg/rate/api_limiter.go Outdated Show resolved Hide resolved
The API rate limiting system is capable to enforce both rate limiting
and maximum parallel requests. Instead of enforcing static limits, the
system is capable to automatically adjust rate limits and allowed
parallel requests by comparing the provided estimated processing
duration with the mean processing duration observed.

Usage:

```
var requestLimiter = rate.NewAPILimiter("myRequest", rate.APILimiterParameters{
	rate.SkipInitial:      5,
	rate.RateLimit:        1.0, // 1 request/s
	rate.ParallelRequests: 2,
}, nil)

func myRequestHandler() error {
	req, err := requestLimiter.Wait(context.Background())
	if err != nil {
		// request timed out whie waiting
		return err
	}
	defer req.Done() // Signal that request has been processed

	// process request ....

	return nil
}
```

Configuration parameters:

 - EstimatedProcessingDuration time.Duration

   EstimatedProcessingDuration is the estimated duration an API call
   will take. This value is used if AutoAdjust is enabled to
   automatically adjust rate limits to stay as close as possible to the
   estimated processing duration.

 - AutoAdjust bool

   AutoAdjust enables automatic adjustment of the values
   ParallelRequests, RateLimit, and RateBurst in order to keep the mean
   processing duration close to EstimatedProcessingDuration

 - MeanOver int

   MeanOver is the number of entries to keep in order to calculate the
   mean processing and wait duration

 - ParallelRequests int

   ParallelRequests is the parallel requests allowed. If AutoAdjust is
   enabled, the value will adjust automatically.

 - MaxParallelRequests int

   MaxParallelRequests is the maximum parallel requests allowed. If
   AutoAdjust is enabled, then the ParalelRequests will never grow above
   MaxParallelRequests.

 - MinParallelRequests int

   MinParallelRequests is the minimum parallel requests allowed. If
   AutoAdjust is enabled, then the ParallelRequests will never fall
   below MinParallelRequests.

 - RateLimit rate.Limit

   RateLimit is the initial number of API requests allowed per second.
   If AutoAdjust is enabled, the value will adjust automatically.

 - RateBurst int

   RateBurst is the initial allowed burst of API requests allowed. If
   AutoAdjust is enabled, the value will adjust automatically.

 - MinWaitDuration time.Duration

   MinWaitDuration is the minimum time an API request always has to wait
   before the Wait() function returns an error.

 - MaxWaitDuration time.Duration

   MaxWaitDuration is the maximum time an API request is allowed to wait
   before the Wait() function returns an error.

 - Log bool

   Log enables info logging of processed API requests. This should only
   be used for low frequency API calls.

   Example:
   ```
   level="info" msg="Processing API request with rate limiter" maxWaitDuration=10ms name=foo parallelRequests=2 subsys=rate uuid=933267c5-01db-11eb-93bb-08002720ea43
   level="info" msg="API call has been processed" name=foo processingDuration=10.020573ms subsys=rate totalDuration=10.047051ms uuid=933265c7-01db-11eb-93bb-08002720ea43 waitDurationTotal="18.665µs"
   level=warning msg="Not processing API request. Wait duration for maximum parallel requests exceeds maximum" maxWaitDuration=10ms maxWaitDurationParallel=10ms name=foo parallelRequests=2 subsys=rate uuid=933269d2-01db-11eb-93bb-08002720ea43
   ```

 - DelayedAdjustmentFactor float64

   DelayedAdjustmentFactor is percentage of the AdjustmentFactor to be
   applied to RateBurst and MaxWaitDuration defined as a value between
   0.0..1.0. This is used to steer a slower reaction of the RateBurst
   and ParallelRequests compared to RateLimit.

 - SkipInitial int

   SkipInitial is the number of API calls to skip before applying rate
   limiting. This is useful to define a learning phase in the beginning
   to allow for auto adjustment before imposing wait durations on API
   calls.

 - MaxAdjustmentFactor float64

   MaxAdjustmentFactor is the maximum adjustment factor when AutoAdjust
   is enabled. Base values will not adjust more than by this factor.

The configuration of API rate limiters is typically provided as-code to
establish defaults. A string based configuration option can then be used
to adjust defaults. This allows to expose configuration of rate limiting
using a single option flag:

``` go
l, err = NewAPILimiterSet(map[string]string{
	"foo": "rate-limit:2/m,rate-burst:2",
}, map[string]APILimiterParameters{
	"foo": {
		RateLimit:  rate.Limit(1.0 / 60.0),
		AutoAdjust: true,
	},
}, nil)
```

Signed-off-by: Thomas Graf <thomas@cilium.io>
Add default rate limiting for all endpoint related API calls with
automatic adjustment based on estimated processing duration.

Metrics are provided to monitor the rate limiting system:
```
cilium_api_limiter_adjustment_factor                  api_call="endpoint-create"                               0.695787
cilium_api_limiter_processed_requests_total           api_call="endpoint-create" outcome="success"             7.000000
cilium_api_limiter_processing_duration_seconds        api_call="endpoint-create" value="estimated"             2.000000
cilium_api_limiter_processing_duration_seconds        api_call="endpoint-create" value="mean"                  2.874443
cilium_api_limiter_rate_limit                         api_call="endpoint-create" value="burst"                 4.000000
cilium_api_limiter_rate_limit                         api_call="endpoint-create" value="limit"                 0.347894
cilium_api_limiter_requests_in_flight                 api_call="endpoint-create" value="in-flight"             0.000000
cilium_api_limiter_requests_in_flight                 api_call="endpoint-create" value="limit"                 0.000000
cilium_api_limiter_wait_duration_seconds              api_call="endpoint-create" value="max"                  15.000000
cilium_api_limiter_wait_duration_seconds              api_call="endpoint-create" value="mean"                  0.000000
cilium_api_limiter_wait_duration_seconds              api_call="endpoint-create" value="min"                   0.000000
```

Signed-off-by: Thomas Graf <thomas@cilium.io>
@tgraf tgraf force-pushed the pr/tgraf/endpoint-create-rate-limit branch from 6e384e1 to c598865 Compare October 2, 2020 13:56
@tgraf
Copy link
Member Author

tgraf commented Oct 2, 2020

Changes made:

  • The adjustment factor is now applied to the base value instead of the latest calculation. This resolves an issue with continuous correction when the estimated duration cannot be achieved. With this, the default delayed adjustment has been changed from 0.25 to 0.5 due to slower correction.
  • The SkipInitial logic now applies to scheduled requests and not processed requests to only skip N requests instead of waiting for N requests to complete. If many requests got scheduled quickly, the two values could deviate a lot.
  • The documentation now mentions the default values

@tgraf tgraf force-pushed the pr/tgraf/endpoint-create-rate-limit branch from c598865 to 0906012 Compare October 2, 2020 13:59
Signed-off-by: Thomas Graf <thomas@cilium.io>
@tgraf tgraf force-pushed the pr/tgraf/endpoint-create-rate-limit branch from 0906012 to ec0f695 Compare October 2, 2020 14:04
@tgraf
Copy link
Member Author

tgraf commented Oct 2, 2020

test-me-please

@tgraf tgraf merged commit d315ec3 into master Oct 2, 2020
@tgraf tgraf deleted the pr/tgraf/endpoint-create-rate-limit branch October 2, 2020 16:43
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.7 in 1.7.11 Oct 2, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.8 in 1.8.5 Oct 6, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.8 to Backport done to v1.8 in 1.8.5 Oct 7, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.8 to Backport done to v1.8 in 1.8.5 Oct 7, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.7 to Backport done to v1.7 in 1.7.11 Oct 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug This is a bug in the Cilium logic. priority/high This is considered vital to an upcoming release. release-note/minor This PR changes functionality that users may find relevant to operating Cilium.
Projects
No open projects
1.7.11
Backport done to v1.7
1.8.5
Backport done to v1.8
Development

Successfully merging this pull request may close these issues.

None yet

8 participants