Sliding window implementation for PID controller#874
Sliding window implementation for PID controller#874AbdulRahmanAlHamali merged 8 commits intopid-take-2from
Conversation
7bcf49c to
4a3b0de
Compare
| @@ -0,0 +1,97 @@ | |||
| # frozen_string_literal: true | |||
There was a problem hiding this comment.
Is there a reason simple_sliding_window.rb wouldn't work here?
There was a problem hiding this comment.
Or maybe this could inherit from it
There was a problem hiding this comment.
hmmm, I'll look at its implementation, but we'll have to use it in a slightly different way: the classic circuit breaker only slides the window when it receives a new response, but we want to slide it continuously every second. So it will probably need some tweaking
ce2ef9f to
1f6fa64
Compare
| initial_history_duration: 900, # 15 minutes of initial history for p90 calculation | ||
| initial_error_rate: options[:initial_error_rate] || 0.01, # 1% error rate for initial p90 calculation | ||
| thread_safe: Semian.thread_safe?, | ||
| implementation: implementation(**options), |
There was a problem hiding this comment.
Not exactly part of the PR, but just making the implementation more consistent with the rest of the repo, where they define an implementation and it's either "Simple" or "ThreadSafe"
|
|
||
| @last_p_value = 0.0 | ||
| end | ||
| module Simple |
There was a problem hiding this comment.
Tip: set "Hide Whitespaces" to true while reviewing this file
lib/semian/pid_controller.rb
Outdated
| initial_value: initial_error_rate, | ||
| ) | ||
|
|
||
| @errors = implementation::SlidingWindow.new |
There was a problem hiding this comment.
I could've technically put these three into 1 array with entries looking like:
{type:, timestamp}
But this is more efficient since we later have to scan the array to decide how many errors vs. successes, while this way we could simply look at sizes
| # Clean up old observations | ||
| current_timestamp = current_time | ||
| cutoff_time = current_timestamp - @window_size | ||
| @errors.reject! { |timestamp| timestamp < cutoff_time } |
There was a problem hiding this comment.
this could be implemented more efficiently, since we know entries are ordered by time. So if found to be too slow, we can optimize
lib/semian/simple_sliding_window.rb
Outdated
|
|
||
| def push(value) | ||
| resize_to(@max_size - 1) # make room | ||
| resize_to(@max_size - 1) unless @max_size.nil? # make room |
There was a problem hiding this comment.
I think max_size makes sense for the classic, because we know exactly how many requests we want to store before opening the circuit. For adaptive, we don't know that, so I elected to make them unlimited.
This should be ok, since we don't expect to see too many requests to a single dependency. But we could add an optional max_size later if we see issues
There was a problem hiding this comment.
I agree it might not be as concrete as the classic, but just thinking out loud it would be good to have some limit to prevent unbound growth
There was a problem hiding this comment.
agreed. Do you have a number in mind? Or should we just put some large value, e.g. 1000 or something?
There was a problem hiding this comment.
Ideally we should choose a setting that increases/decreases with window size. I'm guessing 100 x window_size. So for window_size = 10, that would be a 1000.
|
|
||
| assert_equal(0, @controller.metrics[:rejection_rate]) | ||
| end | ||
| # def test_update_flow |
There was a problem hiding this comment.
These tests have to be fixed. However, in the next PR, I'm going to simplify the controller, from a PID controller to a simple P controller as discussed. So I'm not going to invest in this now
29fc9e8 to
a3130b9
Compare
8dfcc48 to
20299bd
Compare
|
|
||
| def wait_for_window | ||
| Kernel.sleep(@window_size) | ||
| Kernel.sleep(@sliding_interval) |
There was a problem hiding this comment.
Shouldn't this still be the window size?
There was a problem hiding this comment.
I don't think so, if we're sliding every second, we would want to wake the thread up every second, but then look back at the entire window size
20299bd to
72d0208
Compare
| @@ -0,0 +1,97 @@ | |||
| # frozen_string_literal: true | |||
There was a problem hiding this comment.
Or maybe this could inherit from it
lib/semian/pid_controller.rb
Outdated
| observations_per_minute: 60 / sliding_interval, | ||
| ) | ||
|
|
||
| @errors = implementation::SlidingWindow.new(max_size: 100 * window_size) |
There was a problem hiding this comment.
What is the meaning of multiplying by 100?
There was a problem hiding this comment.
basically I'm allowing up to a 100 requests per second to be stored. So a 1000 requests in a 10-second time window.
It's an arbitrary limit. Just not to keep things unbound
72d0208 to
22cd79b
Compare
| def calculate_window_error_rate | ||
| total_requests = @current_window_requests[:success] + @current_window_requests[:error] | ||
| return 0.0 if total_requests == 0 | ||
| total_requests = @successes.size + @errors.size |
There was a problem hiding this comment.
Why doesn't total requests include the rejections? I feel like there was a reason for this, but I'm blanking on it
There was a problem hiding this comment.
you want to find the error rate in the requests that did actually make it to the backend
Our tests have shown the discrete window implementation to be too janky, with rejection rate dropping too aggressively. Sliding window provides better smoothness.
fixes https://github.com/Shopify/resiliency/issues/6615