-
Notifications
You must be signed in to change notification settings - Fork 2
RFC: Rate limiting (throughput quota) design
When multiple tenants are hosted on the same cluster, we should be able to limit the impact of a single tenant (that happens to experience unplanned high volume or errant client) upon other tenants of the cluster.
We want to limit all of the requests: get, put, getAll and delete.
We specify a soft limit and a hard limit. Upon violation of a soft limit, we register the violating store in a JMX getter, allowing monitoring tools to alert the owners of the store (this is the approach used by the disk quota subsystem).
Upon violation of a hard limit, we ban the store for a specified limit.
Count the number of requests that happen during each second. Consider a store violating the limit as soon as the number of requests per second exceeds the limit. This uses the standard RequestCounter class from Voldemort (the statistics are kept for a whole interval and are accumulated during the next interval).
There are multiple approaches to this. One approach (approach A) is respond to any requests from the client with an application exception. The network requests would still reach the server, the server would still reply. This is “load shedding”: here the load is shed at the higher levels, before the keys/values are themselves deserialized and the disk is impacted. Note: that may still, however be insufficient.
Another approach (approach B) is to send a hard exception back to the client as to cause the client’s failure detector to mark the server down and stop sending requests. This is somewhat coarse grained: all the clients from the same StoreClientFactory become banned, but: 1) usually a StoreClientFactory is associated with a single application 2) this prevents the client from sending any data to the server, meaning there’s no additional burden on the server from the “banned” traffic.
So far approach A was taken and has been integration tested. What’s needed is a manual integration test of approach A, and perhaps a side-by-side comparison with approach B.