Add Cdo::Throttle module #38142

maddiedierker · 2020-12-06T20:04:20Z

Adds a simple module that tracks whether or not a particular ID should be throttled.

Cdo::Throttle.throttle uses CDO.shared_cache (which uses Elasticache and is shared across all instances) to track timestamps for the given ID, then uses the provided limit, period, and throttle_time arguments to determine how many requests are allowed in the given timeframe, and if requests are above that threshold, how long that ID should be throttled.

Requests are also tracked while the ID is throttled, so making requests while throttled means that ID could stay throttled even after the original throttle_time has passed.

Important note: Because this module uses IDs to track usage, be sure the ID you provide is unique so it doesn't override unrelated IDs.

Example: Cdo::Throttle.throttle("profanity/a1b2c3", 10, 60, 20) -- The request tracked by ID profanity/a1b2c3 can make 10 requests in 60 seconds before it is throttled. Once throttled, it will stay throttled for 20 seconds.

Also see example usage in #38143.

Links

first part of STAR-1185

Testing story

Adds unit tests.

Reviewer Checklist:

Tests provide adequate coverage
Privacy and Security impacts have been assessed
Code is well-commented
New features are translatable or updates will not break translations
Relevant documentation has been added or updated
User impact is well-understood and desirable
Pull Request is labeled appropriately
Follow-up work items (including potential tech debt) are tracked and linked

ajpal

Not familiar with this area, but this looks good and well-described in the PR description

jamescodeorg

I love the simplicity and clarity of this design, but I think that does impose some limitations on how it can be used. Let me know if you'd like to chat about this more.

jamescodeorg · 2020-12-11T20:18:54Z

lib/cdo/throttle.rb

+        value[:throttled_until] = now + throttle_for if should_throttle
+      end
+
+      CDO.shared_cache.write(full_key, value)


There's a race condition here if several different processes are all reading from and writing to the value at the same time which will undercount the number of requests. (That might be acceptable given the use case, but it would be a good limitation to note.)

ah, yeah, you're right. i think it's acceptable for the current consumers (the client is caching responses and the server is doing some caching as well, so this is only called for unique/uncached requests), but i'd like to fix this to future-proof it.

i think the solution for this would be to implement a mutex or a queue here, but are there other/better solutions? i worry about a mutex because i think it would have to lock for everybody (rather than being able to lock per ID), but maybe that's okay -- i'm not super familiar with implementing/using locks in this way

If we're using Redis, then this could be a good way to implement it: https://redislabs.com/redis-best-practices/basic-rate-limiting/. (The behavior would be a bit different as you've defined it here though.)

A local lock probably wouldn't help because it's still local to an instance and a distributed lock is difficult to implement. Two viable patterns are atomic operations (such as increment) at the cache server or optimistic concurrency (where you make sure the value that you're writing hasn't changed since you read it).

jamescodeorg · 2020-12-11T20:21:06Z

lib/cdo/throttle.rb

+      else
+        value[:throttled_until] = nil
+        earliest = now - period
+        value[:request_timestamps].select! {|timestamp| timestamp >= earliest}


I'm slightly worried about this. What kind of limit values do you think we will see in practice? Throttling functions typically need to be really fast and this design grows linearly in time and space.

yeah, this is the part that i feel weird about as well. the limit values are currently 100 requests / 60 seconds for identifiable users and 1000 requests / 60 seconds for unidentifiable users:

code-dot-org/dashboard/app/controllers/profanity_controller.rb

Lines 6 to 9 in 21bb2de

# Allowed number of unique requests per minute before that client is throttled.

# These values are fallbacks for DCDO-configured values used below.

REQUEST_LIMIT_PER_MIN_DEFAULT = 100

REQUEST_LIMIT_PER_MIN_IP = 1000

the limits are configurable via DCDO values for each consumer. when a consumer is throttled, i am logging to honeybadger so we're notified and know if we need to adjust those limits

do you have any thoughts for how to improve this? i'm trying to keep this array of timestamps small by evicting any values outside of the period (which is currently 60 seconds for all consumers) to mitigate, but there may be a better way to do this (or to track the requests differently to avoid this problem entirely)

One way to solve this is to track counts in buckets instead of the exact timestamps. Each bucket would represent the number of requests that arrived during a particular window (say, 5 seconds) and you could sum up enough buckets to get the count for the interval that you want. Old buckets would automatically age out of the interval as time progresses. (Sorry if that's not super clear, this might be easier to explain with a sketch if it doesn't make sense.)

jamescodeorg · 2020-12-11T20:22:33Z

lib/cdo/throttle.rb

+    # @returns [Boolean] Whether or not the request should be throttled.
+    def self.throttle(id, limit, period, throttle_for = throttle_time)
+      full_key = CACHE_PREFIX + id.to_s
+      value = CDO.shared_cache.read(full_key) || empty_value


Does this result in a network call? How is failure handled (e.g. if the shared cache is down)?

yes, this will result in a network call if CDO.shared_cache is using ElastiCache (it's the default, but falls back to a FileStore cache if initializing ElastiCache fails here).

if i'm reading the implementation correctly, nil will be returned on failure. this is rails' implementation of read, which calls the subclass' implementation of read_entry, which would be this implementation when using ElastiCache.

I think we may want to make this more explicit and just bypass the throttler if elasticache is temporarily unavailable?

Madelyn Kasula added 2 commits December 6, 2020 11:48

Add Cdo::Throttle module + unit tests

c5f84ff

Clarify comment

a90339e

maddiedierker mentioned this pull request Dec 6, 2020

Throttle POST /profanity/find #38143

Merged

8 tasks

maddiedierker requested review from sureshc, wjordan and a team December 7, 2020 17:31

maddiedierker mentioned this pull request Dec 7, 2020

Throttle POST /dashboardapi/v1/text_to_speech/azure #38148

Merged

8 tasks

ajpal approved these changes Dec 8, 2020

View reviewed changes

maddiedierker merged commit 5cea064 into staging-next Dec 9, 2020

maddiedierker deleted the cdo-throttle branch December 9, 2020 18:18

jamescodeorg reviewed Dec 11, 2020

View reviewed changes

wjordan mentioned this pull request Dec 16, 2020

Fix path to shared_cache in filesystem #38269

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Cdo::Throttle module #38142

Add Cdo::Throttle module #38142

maddiedierker commented Dec 6, 2020 •

edited

ajpal left a comment

jamescodeorg left a comment

jamescodeorg Dec 11, 2020

maddiedierker Dec 11, 2020

jamescodeorg Dec 11, 2020

jamescodeorg Dec 11, 2020

maddiedierker Dec 11, 2020

maddiedierker Dec 11, 2020 •

edited

jamescodeorg Dec 11, 2020

jamescodeorg Dec 11, 2020

maddiedierker Dec 11, 2020

jamescodeorg Dec 11, 2020

	# Allowed number of unique requests per minute before that client is throttled.
	# These values are fallbacks for DCDO-configured values used below.
	REQUEST_LIMIT_PER_MIN_DEFAULT = 100
	REQUEST_LIMIT_PER_MIN_IP = 1000

Add Cdo::Throttle module #38142

Add Cdo::Throttle module #38142

Conversation

maddiedierker commented Dec 6, 2020 • edited

Links

Testing story

Reviewer Checklist:

ajpal left a comment

Choose a reason for hiding this comment

jamescodeorg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maddiedierker Dec 11, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maddiedierker commented Dec 6, 2020 •

edited

maddiedierker Dec 11, 2020 •

edited