Elixir
Switch branches/tags
Nothing to show
Clone or download

README.md

Breaker

Hex.pm Build Status Coverage Status Inline docs Ebert

A Circuit Breaker in Elixir for making async HTTP(S) requests to external resources. Uses HTTPotion to make requests.

The following README documentation is for the master branch. Maybe you're looking for the 1.0.0 docs?

Installation

Add this project as a dependency in your mix.exs file:

defp deps do
  [
    {:breaker, "~> 1.0.0"}
  ]
end

And then run:

$ mix deps.get

Simple Usage

To create a circuit breaker for an external resource, do something like the following:

{:ok, user_service} = Breaker.start_link([url: "http://example.com/users/"])

Then, you can use it and Breaker to make HTTP calls:

GET example

You can make a request for some data you know you'll need later:

# Makes a GET request to "http://example.com/users/42"
user_request = Breaker.get(user_service, "/42")

# do some other things, then later, when you need it

user = Task.await(user_request)

POST example

Say you need to create a new user and ensure the response from the other service was good.

body = build_new_user_body(new_user)
request = Breaker.post(user_service, "/", [body: body])

# do some other things,

# then ensure you got a good response from your request,
# otherwise put it in Redis or something for later
response = Task.await(request)
cond do
  Breaker.error?(response) ->
    # put this request in Redis for later
  # other possible responses, like 403 or 422
  response.status_code == 200 ->
    # yay, continue
end

Other HTTP Methods

Breaker has a function for each of the HTTP methods: GET, POST, PUT, PATCH, HEAD, DELETE, and OPTIONS.

They follow the same easy convention as HTTPotion: Breaker.get/3, Breaker.put/3, etc.

Naming your Breaker

Breaker.start_link can accept an extra parameter and will pass it directly to GenServer as a name to register the process.

Breaker.start_link([url: "http://example.com/users/"], :user_service)
# Now you can just use the registered name
user_request = Breaker.get(:user_service, "/42")

This makes it easier to use application-wide breakers and supervision trees.

Other Helpful Functions

  • Breaker.open?/1 takes a breaker and returns a boolean, asking if it is open (won't allow network flow)
  • Breaker.error?/1 takes a response and returns a boolean, asking if the response was some sort of error (Status Code of 500, timeout, Breaker.OpenCircuitError)
  • Breaker.trip/1 sets the breaker's status to open, disallowing network flow.
  • Breaker.reset/1 sets the breaker's status to closed, allowing network flow.

You probably don't want to make use of Breaker.trip/1 and Breaker.reset/1 because the breaker's status will be recalculated after a request and override what you've manually set.

Configuration

You can configure your new breaker with a few different options. The following options affect each request made:

  • url: Required, the base URL for your external serivce, like "http://your-domain.com/users/" for your user service, or "http://users.your-domain.com/"
  • headers: Any headers (like in HTTPotion) that should be included in EVERY request made by the circuit breaker. This could be something like an authentication token or a service identifier for logs. The default is [].
  • timeout: The number of milliseconds before giving up on a request. This is passed to HTTPotion and has a default of 3000, or 3 seconds.

The following options affect how the breaker's status is calculated:

  • error_threshold: The percent (as a float) of requests that are allowed to be bad (bad = 500 or timeout). The default is 0.05 or 5%. Once this threshold is passed, the breaker trips and more requests will return %Breaker.OpenCircuitError{} responses.
  • bucket_length: The breaker uses multiple buckets in a health window to determine the error_rate. This setting specifies, in milliseconds, how long a bucket should be. The default is 1000 or 1 second.
  • window_length: The length (in buckets) of the health window. This number, multiplied by bucket_length is the total number of milliseconds used to calculate health. The default is 10.

Understanding 'buckets' and 'windows'

The breaker uses multiple 'buckets' in a 'window' to determine health and roll out old requests. Buckets are measured in time (milliseconds) and windows are measured in buckets. This means that using the defaults, health is calculated based on responses received in the last 10 seconds of operation. I highly encourage you to play with these settings to accomodate your individual traffic.

To give an example, say your application is happily going along, processing requests and making requests of an external service, the User Serivce. It's making an average of 1 request per second, using the default bucket_length and window_length. Then, it hits a 500 error. At this point, it's error rate was 0%, but just jumped to 10%, above the default error_threshold. Now, when you make a new request, the breaker is open. Instead of waiting up to 3 seconds to get a 500 error or timeout, the request fails fast, returning a %Breaker.OpenCircuitError{}. In about 9 more seconds, the bucket that contained our 500 error will be rotated out, closing the circuit and leaving us with a clean slate.

If our very next request now times out or gives a 500 (because the external service still isn't working properly), then we have an error rate of 100% and the circuit opens for another 10 seconds.

If, instead, the service had recovered while the circuit was open, then we only have (at most) about 10 seconds of missed requests. Hopefully, we have designed our application such that those requests are not absolutely required, or we've stashed them in a queue somewhere for processing later.

The above is a greatly simplified example because normally you'll want to create a breaker for something that you'll need to make calls against at a much higher rate. Basically, keep the following things in mind when configuring your breaker:

  • bucket_length should be long enough that, on average, you'll have 5 or more requests in that period of time. For the above example, we'd probably want 5000.
  • window_length should be high enough that you give the external service time to recover, but low enough that errors from awhile ago aren't bogging down your application's performance or features now. The default might just be good enough unless you've found a reason to change it.
  • error_threshold should be low enough that your users aren't dealing with a really slow experience. It will need to be higher if bucket_length and window_length aren't enough to get a good sample of the requets. In the above example, it would have probably been acceptable to have an error_threshold of something like 0.2 (20%) so I can tolerate more than 1 error before breaking the circuit.

Contributing

Bug reports are welcome and contributions are encouraged. If something isn't working the way it should or a convention isn't being followed, that's a bug. If there isn't documentation for something, I consider that a bug, too.

Please note that this project is released with a Contributor Code of Conduct. By participating in this project, you agree to abide by its terms.

License

This project is released under the MIT license, as detailed in the included license.txt file.