Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add options for statistical tracking of success #20

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jamiemccarthy
Copy link

Purpose

Vox Media, LLC has been making use of a modified version of the breakers gem, in our primary internet-facing web application, for about four years now. This PR submission is a subset of our modifications, as described in #19 .

These modifications improve the efficiency of breakers, and on busy systems they improve it dramatically, at the cost of some accuracy. The efficiency-vs-accuracy tradeoff can be dialed to any ratio desired.

The underlying assumption is that success is the "99% case," and that it's worth optimizing for the 99% case.

Changes

Specifically, when using breakers with this PR's modifications, three new options may be specified in Breakers::Service configuration:

  • Each successful Faraday request sends INCR to redis only p% of the time, but it increments the stored value by 1/p%, rather than sending a separate increment of 1 with each request.

    • So if p == 0.05, the client sends an INCR 20, but it randomly sends it 1 time out of 20 successful requests. This would cut write traffic by a factor of 20.
  • The middleware sends a ZRANGE check to redis only once every s seconds, in any given ruby process, rather than a separate check with every Faraday request.

    • This is most useful when a client can make multiple Faraday requests to the same service in rapid succession. Since within each process, such access is often extremely bursty, checking at the start of a burst is most likely to reveal new information.
  • A minimum number of errors e must be observed before an outage is reported.

    • When a minimum level of traffic to the remote services can be assumed, this helps smooth out any glitches due to the randomness of p.

The names of these configuration options are:

  • p is success_sample_per and defaults to 1.0 (no sampling)
  • s is seconds_between_outage_checks and defaults to 0 (always check for existing outage)
  • e is min_errors and defaults to 1 (always check for creating outage)

Those default values ensure exactly the same behavior for Breakers::Service as in existing versions of the gem, so this PR is 100% backwards compatible with existing installations.

(Vox Media happens to have set those values at 0.1, 10, and 100, respectively, for the past several years.)

Tests

New tests have been added for each of these three features to spec/integration_spec.rb.

I've verified that the tests pass when run on ruby 2.7.x (and bundler 2.x), and on faraday from 0.11.0 to 1.1.0. I haven't attempted to test against other versions, but I'd be surprised if it doesn't work on ruby 2.3+ and the existing faraday version spec. (If you'd be interested, I'd be happy to submit a separate PR to take advantage of GitHub Actions, Docker, and the appraisal gem to run the tests against a matrix of versions.)

Notes

We've also pushed our work to the now-public repository https://github.com/voxmedia/breakers . This PR is a curated subset of that work. If you're interested, our primary web app is currently running at the version in that repo tagged v1.0.1. Questions or requests for changes are welcome! We'd like to thank the Department of Veterans Affairs for open-sourcing this work.

I'm submitting this PR under the terms of the existing LICENSE.md in this repository. The authors of this work are Jamie McCarthy (@jamiemccarthy), Steve McKinney (@stephenmckinney), and Blake Thomson (@thomsbg).

Co-authored-by: Blake Thomson <thomsbg@gmail.com>
Co-authored-by: Steve McKinney <steve.mckinney@gmail.com>
@jamiemccarthy
Copy link
Author

Here's a six-month ping to see if anyone's able to look this over. I'll tag @rileyanderson and @ericboehs on the chance you're still involved with this project. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant