Add defensiveness coefficient, and simplify the controller into a P-only controller by AbdulRahmanAlHamali · Pull Request #875 · Shopify/semian

AbdulRahmanAlHamali · 2025-11-17T15:28:35Z

A few realizations from our experiments:

We should not only attempt to match the error rate. Our goal is to attempt to eliminate the error rate
Also, when the error rate drops, our P value should not immediately drop

This PR addresses this by introducing a "defensiveness" coefficient, making the P equation:

(error_rate - ideal_error_rate) - 1 / defensiveness * rejection_rate

I set the default value of defensiveness to 5, which means that:

The controller is willing to block 5 times as many requests as the error rate, in attempting to eliminate it (as opposed to only once as many, which we currently do)
When the error rate is eliminated, the rejection rate will go down gradually as well.

The other thing we noticed is that both the I and the D have little to no contribution to our performance. Thus, I removed them, significantly simplifying the code

fixes https://github.com/Shopify/resiliency/issues/6644 and https://github.com/Shopify/resiliency/issues/6645

Aguasvivas22

LGTM, however there might be an issue in PR 874.

The graphs for sudden error spikes and sustained load look off and when I ran the sudden spikes test, I see an output that shows 300 windows vs what should be only 30 windows.

I haven't been able to review the other PR so not sure if that's by design or a bug, so wanted to put this out there.

No issues with this PR though but if it is an error, than we'd want to fix that and regenerate graphs here.

UPDATED:
This is actually by design since in #874 we change to sliding windows and adjusting every second instead of every 10s.

anagayan · 2025-11-21T21:28:56Z

lib/semian/proportional_controller.rb

+    # P = (error_rate - ideal_error_rate) - (1/defensiveness) * rejection_rate
    # Note: P increases when error_rate increases
    #       P decreases when rejection_rate increases (providing feedback)
-    class PIDController
+    class ProcessController
      attr_reader :name, :rejection_rate

-      def initialize(kp:, ki:, kd:, window_size:, sliding_interval:, implementation:, initial_history_duration:,
+      def initialize(defensiveness:, window_size:, sliding_interval:, implementation:,


This is basically a coefficient for the rejection_rate. I'd prefer if we just treat it like that directly in the formula (without the 1/) and illustrate what the range of coefficients will represent. I'm also not a fan of the exact naming of this variable.

As an aside, if we are weighting the rejection_rate differently, is it worth considering doing something similar for the error_rate as well.

As an aside, if we are weighting the rejection_rate differently, is it worth considering doing something similar for the error_rate as well.

So a coefficient for the error rate would basically allow us to block less than the error rate if we want to. Which I guess is a possibility, I'm just not sure if there is value in doing that

anagayan · 2025-11-21T21:30:46Z

lib/semian/pid_controller.rb

-        @kp = kp  # Proportional gain
-        @ki = ki  # Integral gain
-        @kd = kd  # Derivative gain


I haven't done any experiments or have much basis, but given that the actual targeting equation P is changing I would think it might be worthwhile playing around with these values before removing I & D altogether.

If it works fine with just the P and that's good enough, then I'm fine with the simplification too

anagayan · 2025-11-25T15:22:23Z

lib/semian.rb

-      kd: 0.5, # Small derivative gain (as per design doc)
      window_size: 10, # 10-second window for rate calculation and update interval
      sliding_interval: 1, # 1-second interval for background health checks
-      initial_history_duration: 900, # 15 minutes of initial history for p90 calculation


Switch to SES algorithm made this obsolete. This is a cleanup

anagayan · 2025-11-25T15:28:58Z

lib/semian/pid_controller.rb

        initial_error_rate:,
        smoother_cap_value: SimpleExponentialSmoother::DEFAULT_CAP_VALUE)
-        # PID coefficients
-        @kp = kp  # Proportional gain


Discussion detailing that we should keep Kp value here and set it to 1

anagayan · 2025-11-25T15:30:23Z

lib/semian/pid_controller.rb

        initial_error_rate:,
        smoother_cap_value: SimpleExponentialSmoother::DEFAULT_CAP_VALUE)
-        # PID coefficients
-        @kp = kp  # Proportional gain


Let's keep Kp here and set it to 1 (based on discussion)

AbdulRahmanAlHamali · 2025-12-04T17:41:07Z

Replaced by #899

AbdulRahmanAlHamali force-pushed the sliding-implementation branch 11 times, most recently from 8dfcc48 to 20299bd Compare November 20, 2025 22:14

AbdulRahmanAlHamali force-pushed the rejection-rate branch from dccbfa6 to c66905d Compare November 21, 2025 05:40

AbdulRahmanAlHamali changed the title ~~only count P, and add rejection rate coefficient~~ Add defensiveness coefficient, and simplify the controller into a P-only controller Nov 21, 2025

AbdulRahmanAlHamali force-pushed the rejection-rate branch 3 times, most recently from 11bf32d to bc0f019 Compare November 21, 2025 13:50

AbdulRahmanAlHamali marked this pull request as ready for review November 21, 2025 13:52

Aguasvivas22 reviewed Nov 21, 2025

View reviewed changes

anagayan reviewed Nov 21, 2025

View reviewed changes

anagayan reviewed Nov 25, 2025

View reviewed changes

AbdulRahmanAlHamali force-pushed the sliding-implementation branch 2 times, most recently from 72d0208 to 22cd79b Compare November 26, 2025 16:47

AbdulRahmanAlHamali force-pushed the rejection-rate branch from bc0f019 to 04c50a7 Compare November 26, 2025 18:16

Base automatically changed from sliding-implementation to pid-take-2 November 26, 2025 18:16

AbdulRahmanAlHamali force-pushed the rejection-rate branch 2 times, most recently from 61cb004 to aa478a8 Compare November 26, 2025 20:29

AbdulRahmanAlHamali added 2 commits November 27, 2025 18:04

switch to a simple process controller with a defensiveness coefficient

dfcbccd

use proportional controller instead of process controller

e022640

AbdulRahmanAlHamali added 2 commits November 27, 2025 18:05

run experiments, and fix the one of many services experiment

bee208b

rerun experiments

a992aa6

AbdulRahmanAlHamali force-pushed the rejection-rate branch from 69b635f to a992aa6 Compare November 27, 2025 18:44

AbdulRahmanAlHamali closed this Dec 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add defensiveness coefficient, and simplify the controller into a P-only controller#875

Add defensiveness coefficient, and simplify the controller into a P-only controller#875
AbdulRahmanAlHamali wants to merge 4 commits intopid-take-2from
rejection-rate

AbdulRahmanAlHamali commented Nov 17, 2025 •

edited

Loading

Uh oh!

Aguasvivas22 left a comment •

edited

Loading

Uh oh!

anagayan Nov 21, 2025

Uh oh!

AbdulRahmanAlHamali Nov 26, 2025

Uh oh!

anagayan Nov 21, 2025

Uh oh!

anagayan Nov 25, 2025

Uh oh!

anagayan Nov 25, 2025

Uh oh!

anagayan Nov 25, 2025

Uh oh!

AbdulRahmanAlHamali commented Dec 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

AbdulRahmanAlHamali commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Aguasvivas22 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anagayan Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

AbdulRahmanAlHamali Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

anagayan Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

anagayan Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

anagayan Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

anagayan Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

AbdulRahmanAlHamali commented Dec 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

AbdulRahmanAlHamali commented Nov 17, 2025 •

edited

Loading

Aguasvivas22 left a comment •

edited

Loading