-
Notifications
You must be signed in to change notification settings - Fork 781
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rate limiting for S3 compatible block storage #5822
Comments
Thanks for reporting the issue.
I understand that the current behavior is not ideal. However, this is not an easy problem to solve since Cortex has multiple microservices and multiple replicas sending requests to the object storage at the same time. Thus, it is pretty hard to do rate limiting at client side since what you actually need is a global rate limiter across all your Cortex pods. From the error log provided, did you hit the rate limit from Store Gateway or other components? For most of the components I believe backoff and retry should be fine since they are not that latency sensitive. |
@jakubgs if you are using the mixin for cortex, there is a dashboard for object storage that shows which component is making the requests. Something like this: if you have it, I would like to see them to understand what components and what operations are getting errors. |
That's correct, the log is from a host running 3 services as one node:
Sorry, I don't know what "mixin" is in this context. |
the cortex mixin contains dashboards and alerts, you can find it the latest in https://github.com/cortexproject/cortex-jsonnet/releases |
Oh, no, I have my own dashboard. What is the metric name? |
thanks for sharing, looks like you don't have that many requests, to be honest. The ones that are concerning are the querier and store-gateway errors. To reduce queries to block-storage make sure you have:
|
Describe the bug
We are using DigitalOcean Spaces which is S3-compatible storage solution for storing metrics. This service limits number of
GET
requests one can make to 800 per second. In situations where cache is full we have seen errors like these:Which means the limit of 800 requests per second has been reached.
Expected behavior
According to DigitalOcean support the correct behavior would be something like this:
The question is, would it make more sense to rate-limit requests being made to block storage rather than hit the limit and have to back off from making reqests for longer? Or is hitting the backoff the correct and simpler way to handle this?
The text was updated successfully, but these errors were encountered: