## Bulkhead Pattern
A typical Tomcat container provides 200 threads that handle user request. Now suppose one of the upstream service starts misbehaving and all the calls to this service start experiencing high latency. Any call to our service that relies on misbehaving service would get stuck. This will soon lead to all our Tomcat threads getting saturated by network call to the misbehaving service:

<img src="./images/thread_saturation.png" />

Failure in one service leads to cascading failure in other service. One way to solve the issue, is to use a dedicated thread pool to make the dependency service call. Every dependency would have its own fixed thread pool and instead of making the request on Tomcat thread, the request is made using the dependency's dedicated thread. In a simplistic implementation where we have *x* Tomcat threads and there are *n* dependencies, each dependency can use its own threadpool with *x/n* threads. This would mean if a dependency misbehaves, at max *x/n* Tomcat threads would be blocked.

<img src="./images/bulkhead.png" />

What if we use Tomcat with virtual threadpool? Is Bulkhead still relevant? Yes, because Bulkhead can also be used to guard access to a limited resource. Let's say we have two have two APIs both of which connect to a database. Usually database connections are limited resource and is pooled. If one API makes long running slow SQL queries, slowly it will use all the database connections, leaving none for the second API which usually made fast queries.

## Circuit Breaker
If a certain number of calls to a dependency fails in a given time period, there is no point making additional calls to the same service. Circuit breaker does exactly the same, in open state any call going through the circuit breaker to downstream service would immediately fail and get routed to the fallback method. Circuit breaker has 3 states:
- Closed: normal state, all calls are permitted
- Open: when the amount of failure exceeds a predefined threshold, circuit breaker transitions to open state and no call passes through. This allows the affected downstream service a chance to recover without being overwhelmed by incoming calls
- Half Open: after the set cooldown period, a few calls are permitted to check if the downstream service is bacak up, or still down. If the call succeeds, the circuit breaker returns back to closed state, else it stays in open state.

Aim of circuit breaker is to fail fast and not waste any resource making call to a dependency that we know is going to fail.

## Circuit Breaker vs Bulkhead
Circuit breaker is a reactive mechanism, it kicks in only after the dependent service starts failing. Bulkhead on the other hand is proactive. It monitors concurrency. It doesn't care if the target is healthy or not; it simply ensures that one specific caller doesn't consume more than its "fair share" of resources.

## Netflix Hysterix
Hystrix is a library that helps you control the interactions between these distributed services by adding latency tolerance and fault tolerance logic. It provides us:
- Support for Bulkhead pattern
- Circuit breaker
- Fallback logic in case of failures
- Caching


### Hystrix Flow Diagram
<img src="https://github.com/Netflix/Hystrix/wiki/images/hystrix-command-flow-chart.png" />

Steps:
1. Construct a `HystrixCommand` object to represent the request we are making to the dependency:

In [None]:
HystrixCommand command = new HystrixCommand(arg1, arg2);

2. Execute the Command

In [None]:
         K value = command.execute();
Future<K> fValue = command.queue();

3. If request caching is enabled for this command, and if the response to the request is available in the cache, this cached response will be immediately returned.

4. If the circuit is open (or “tripped”) then Hystrix will not execute the command but will route the flow to (8) get the fallback. If the circuit is closed then the flow proceeds to (5) to check if there is capacity available to run the command.

5. If the thread-pool and queue (or semaphore, if not running in a thread) that are associated with the command are full then Hystrix will not execute the command but will immediately route the flow to (8) Get the Fallback.

6. If the `run()` method exceeds the command’s timeout value, the thread will throw a `TimeoutException` (or a separate timer thread will, if the command itself is not running in its own thread). In that case Hystrix routes the response through 8. Get the fallback, and it discards the eventual return value `run()` method if that method does not cancel/interrupt.

7. Hystrix reports successes, failures, rejections, and timeouts to the circuit breaker, which maintains a rolling set of counters that calculate statistics. This statistic determines whether the circuit should be closed or open.

8. Hystrix tried to revert to fallback whenever a command execution fails: when an exception is thrown by `run()` (6.), when the command is short-circuited because the circuit is open (4.), when the command’s thread pool and queue or semaphore are at capacity (5.), or when the command has exceeded its timeout length.

### Implementation Example

In [None]:
public class CommandHelloWorld extends HystrixCommand<String> {

    private final String name;

    public CommandHelloWorld(String name) {
        super(HystrixCommandGroupKey.Factory.asKey("ExampleGroup"));
        this.name = name;
    }

    @Override
    protected String run() {
        // a real example would do work like a network call here
        return "Hello " + name + "!";
    }
}

To specify fallback:

In [None]:
public class CommandHelloFailure extends HystrixCommand<String> {

    private final String name;

    public CommandHelloFailure(String name) {
        super(HystrixCommandGroupKey.Factory.asKey("ExampleGroup"));
        this.name = name;
    }

    @Override
    protected String run() {
        throw new RuntimeException("this command always fails");
    }

    @Override
    protected String getFallback() {
        return "Hello Failure " + name + "!";
    }
}

All exceptions thrown from the `run()` method except for `HystrixBadRequestException` count as failures and trigger `getFallback()` and circuit-breaker logic.

In [None]:
@Override
protected String run() {
    try {
        dependencyClient.getData();
    } catch (BadRequestException e) {
        throw new HystrixBadRequestException(e);
    }
}

Hysterix uses keys are the metadata used to organize, monitor, and configure the behavior of commands. There are two variants:
- `HystrixCommandKey`: which represents a specific action or a dependency. It usually maps 1:1 to an API. As an example a user services `/profile` and `/history` endpoints may be represented by two different keys. Each `CommandKey` gets its own circuit breaker.
- `HystrixCommandGroupKey`: is used to group related commands together. It usually represents a service or a resource. `CommandGroupKey` allows for bulkhead pattern since a thread-pool is created per group key.

**Bulkhead:** Hystrix can deploy Bulkhead pattern using either:
- thread pool: runs our code in a thread from a thread pool and controls the number of concurrent threads by a bounded queue and thread pool.
- semaphores: runs our code in the current thread and controls the number of concurrent threads by a `Semaphore`.

Hystrix allows us to modify the threadpool properties using `HystrixThreadPoolProperties.Setter()`. We can set properties like core size, max size, keepalive duration, queue size. The default values are:

In [None]:
static int default_coreSize = 10;            // core size of thread pool
static int default_maximumSize = 10;         // maximum size of thread pool
static int default_keepAliveTimeMinutes = 1; // minutes to keep a thread alive
static int default_maxQueueSize = -1;        // -1 means SynchronousQueue

In [None]:
class HelloWorldCommand extends HystrixCommand<String> {
    private final String name;

    public HelloWorldCommand(String name) {
        super(Setter.withGroupKey(HystrixCommandGroupKey.Factory.asKey("HelloWorld"))
                .andCommandKey(HystrixCommandKey.Factory.asKey("Hello"))
                .andThreadPoolPropertiesDefaults(strixThreadPoolProperties.Setter().withCoreSize(50)));
        this.name = name;
    }
    
    // ...
}

If we want to use `Semaphore` based command:

In [None]:
class HelloWorldCommand extends HystrixCommand<String> {
    private final String name;

    public HelloWorldCommand(String name) {
        super(Setter.withGroupKey(HystrixCommandGroupKey.Factory.asKey("HelloWorld"))
                .andCommandKey(HystrixCommandKey.Factory.asKey("Hello"))
                .andCommandPropertiesDefaults(HystrixCommandProperties.Setter()
                    .withExecutionIsolationStrategy(HystrixCommandProperties.ExecutionIsolationStrategy.SEMAPHORE)));
        this.name = name;
    }
    
    // ...
}

Sometimes we may want commands to use different threadpools, but share the same command group (since commands in the same group are shown together in the dashboard). To do so, use `HystrixThreadPoolKey`:

In [None]:
// SMS Command - Fast & Critical
public class SendSmsCommand extends HystrixCommand<Void> {
    public SendSmsCommand() {
        super(Setter
            .withGroupKey(HystrixCommandGroupKey.Factory.asKey("CommunicationService"))
            .andCommandKey(HystrixCommandKey.Factory.asKey("SendSMS"))
            .andThreadPoolKey(HystrixThreadPoolKey.Factory.asKey("SmsPool"))); // Dedicated Pool
    }
}

// Email Command - Slow & Heavy
public class SendEmailCommand extends HystrixCommand<Void> {
    public SendEmailCommand() {
        super(Setter
            .withGroupKey(HystrixCommandGroupKey.Factory.asKey("CommunicationService"))
            .andCommandKey(HystrixCommandKey.Factory.asKey("SendEmail"))
            .andThreadPoolKey(HystrixThreadPoolKey.Factory.asKey("EmailPool"))); // Separate Pool
    }
}

**Circuit Breaker:** the following properties can be associated:

In [None]:
// The default value for these properties in the below examples are the values being set
HystrixCommandProperties.Setter().withCircuitBreakerEnabled(true); // Enable or disable circuit breaker

HystrixCommandProperties.Setter().withCircuitBreakerRequestVolumeThreshold(20);
// Sets the minimum number of requests in a rolling window that will trip the circuit. If the value is 20, 
// then if only 19 requests are received in the rolling window (say a window of 10 seconds)
// the circuit will not trip open even if all 19 failed.

HystrixCommandProperties.Setter().withCircuitBreakerSleepWindowInMilliseconds(5000)
// Sets the amount of time, after tripping the circuit, to reject requests before allowing attempts again
// to determine if the circuit should again be closed.

HystrixCommandProperties.Setter().withCircuitBreakerErrorThresholdPercentage(50);
// Sets the error percentage at or above which the circuit should trip open and start short-circuiting
// requests to fallback logic.

HystrixCommandProperties.Setter().withMetricsRollingStatisticalWindowInMilliseconds(10000);
// Sets the duration of the statistical rolling window, in milliseconds. This is how long Hystrix keeps
// metrics for the circuit breaker to use and for publishing.

HystrixCommandProperties.Setter().withMetricsRollingStatisticalWindowBuckets(10);
// Sets the number of buckets the rolling statistical window is divided into. Following must be true — 
// metrics.rollingStats.timeInMilliseconds % metrics.rollingStats.numBuckets == 0

After sleep window expires, Hystrix transitions to HALF-OPEN state, after which it allows 1 (hardcoded) request to check if dependency is back to normal.

## Resilience4J
Resilience4j is a lightweight fault tolerance library designed for functional programming. Resilience4j provides higher-order functions (decorators) to enhance any functional interface, lambda expression or method reference with a Circuit Breaker, Rate Limiter, Retry or Bulkhead.

### Circuit Breaker
Resilience4j's Circuit Breaker has these 5 states: `OPEN`, `CLOSED`, `HALF_OPEN`, `DISABLED` and `FORCED_OPEN`.  
<img src="images/cb_states.png" >  

There are two ways to determine threshold:
- **Count based:** here we maintain a window of size N. New measurement is added to the window and oldest one is removed if the window size > N. Threshold can be defined as when 50% of the last N calls failed, transition circuit breaker to open state.
- **Time based:** in this case we maintain a window measuring all calls made in last N second. Threshold is defined as 50% of the calls in last N seconds failed, transition circuit breaker to open state.

The state of the CircuitBreaker changes from CLOSED to OPEN when the failure rate is equal or greater than a configurable threshold. Every exception is defined as failure, though we can configure set of exceptions that count as failure.

The CircuitBreaker also changes from CLOSED to OPEN when the percentage of slow calls is equal or greater than a configurable threshold. As an example, open the circuit when 50% of all calls take more than 10 seconds.

The failure rate and slow call rate can only be calculated, if a minimum number of calls were recorded. For example, if the minimum number of required calls is 10, then at least 10 calls must be recorded, before the failure rate can be calculated. If only 9 calls have been evaluated the CircuitBreaker will not trip open even if all 9 calls have failed.

The CircuitBreaker rejects calls with a `CallNotPermittedException` when it is `OPEN`. After a wait time duration has elapsed, the CircuitBreaker state changes from `OPEN` to `HALF_OPEN` and permits a configurable number of calls to see if the backend is still unavailable or has become available again. Further calls are rejected with a `CallNotPermittedException`, until all permitted calls have completed. If the failure rate or slow call rate is then equal or greater than the configured threshold, the state changes back to `OPEN`. If the failure rate and slow call rate is below the threshold, the state changes back to `CLOSED`.

To create a curcuit breaker:

In [None]:
// Values specified are the default value
CircuitBreakerConfig circuitBreakerConfig = CircuitBreakerConfig.custom()
    .failureRateThreshold(50)                      // Configures the failure rate threshold in percentage.
    .slowCallRateThreshold(100)                    // Configures a threshold in percentage.
    .slowCallDurationThreshold(60000)              // Slow call duration threshold in ms
    .permittedNumberOfCallsInHalfOpenState(10)     // Number of calls that go through in half open state (hardcoded as 1 in Hystrix)
    .maxWaitDurationInHalfOpenState(0)             // Configures a maximum wait duration which controls the longest amount of time a
                                                   // CircuitBreaker could stay in Half Open state, before it switches to open.
    .slidingWindowType(SlidingWindowType.COUNT_BASED)  // Type of sliding window TIME_BASED or COUNT_BASED
    .slidingWindowSize(100)                        // Count size or number of seconds
    .waitDurationInOpenState(60000)                // The time that the CircuitBreaker should wait before transitioning from open to half-open.
    .automaticTransitionFromOpenToHalfOpenEnabled(false) // If set to true it means that the CircuitBreaker will automatically transition from 
                                                         // open to half-open state and no call is needed to trigger the transition.
    .recordExceptions(IOException.class, TimeoutException.class)  // Not the default value, by default all exceptions are recorded as failure
    .ignoreExceptions(BusinessException.class)     // Not the default value. List of exceptions that are ignored and neither count as a failure nor success. 
    .build();

// Or use default  configuration
CircuitBreakerConfig defaultConfigs = CircuitBreakerConfig.ofDefaults();

CircuitBreakerRegistry circuitBreakerRegistry = CircuitBreakerRegistry.of(circuitBreakerConfig);
CircuitBreaker circuitBreaker1 = circuitBreakerRegistry.circuitBreaker("serviceA");
CircuitBreaker circuitBreaker2 = circuitBreakerRegistry.circuitBreaker("serviceB");

A registry is basically a map of circuit breaker configurations:

In [None]:
CircuitBreakerRegistry circuitBreakerRegistry = CircuitBreakerRegistry.of(circuitBreakerConfig);

// Is equivalent to
CircuitBreakerRegistry circuitBreakerRegistry = CircuitBreakerRegistry.of(Map.of("default", circuitBreakerConfig));

// Add additional configurations
circuitBreakerRegistry.addConfiguration("alt", alternativeConfiguration);
CircuitBreaker altCircuitBreaker = circuitBreakerRegistry.circuitBreaker("serviceC", "alt");

To use circuit breaker, we can decorate a `Supplier`, `Callable`, `Runnable`, `Consumer`, `Function` and other type of functional components with CircuitBreaker. Just to recap, below we the definition of the different classes:

In [None]:
@FunctionalInterface
public interface Supplier<T> {
    T get();
}

@FunctionalInterface
public interface Callable<V> {
    V call() throws Exception;
}

@FunctionalInterface
public interface Consumer<T> {
    void accept(T var1);
}

@FunctionalInterface
public interface Runnable {
    public abstract void run();
}

@FunctionalInterface
public interface Function<T, R> {
    R apply(T t);
}

public static void main(String[] args){
    Supplier supplier = () -> {
        Date date = new Date();
        if(date.getSeconds() > 30)
            return date.toString();
        else
            throw new RuntimeException("Odd second");
    };
    
    Supplier<String> decoratedSupplier = Decorators.ofSupplier(supplier)
        .withCircuitBreaker(circuitBreaker)
        .decorate();

    System.out.println(decoratedSupplier.get());
}

// Decorators has other methods to pass in Function, Runnable, etc
// Decorators.ofFunction(function).withCircuitBreaker(circuitBreaker).decorate();

To listen for events:

In [None]:
circuitBreaker.getEventPublisher()
    .onSuccess(event -> logger.info(...))
    .onError(event -> logger.info(...))
    .onIgnoredError(event -> logger.info(...))
    .onReset(event -> logger.info(...))
    .onStateTransition(event -> logger.info(...));

// Or if you want to register a consumer listening
// to all events, you can do:
circuitBreaker.getEventPublisher()
    .onEvent(event -> logger.info(...));

### Bulkhead
Resilience4j offers two variants of bulkhead:
- `SemaphoreBulkhead`: based on `Semaphore`
- `ThreadpoolBulkhead`: based on thread pool.

To create a semaphore based bulkhead:

In [None]:
// Create a configuration:
BulkheadConfig bulkheadConfig = BulkheadConfig.custom()
                .maxConcurrentCalls(5) // Max amount of parallel executions allowed
                .maxWaitDuration(Duration.ofSeconds(60)) // Max amount of time a thread should be blocked for 
                                                         // when attempting to enter a saturated bulkhead.
                .build();

// Or use default configuration which sets
// maxConcurrentCalls = 25
// maxWaitDuration = 0
BulkheadConfig defaultConfig = BulkheadConfig.ofDefaults();

BulkheadRegistry registry = BulkheadRegistry.of(bulkheadConfig);
Bulkhead bulkhead1 = registry.bulkhead("serviceA");
Bulkhead bulkhead2 = registry.bulkhead("serviceB");

To create a threadpool based bulkhead:

In [None]:
ThreadPoolBulkheadConfig bulkheadConfig = ThreadPoolBulkheadConfig.custom()
        .maxThreadPoolSize(10)
        .coreThreadPoolSize(5)
        .queueCapacity(20)
        .build();

ThreadPoolBulkheadRegistry registry = ThreadPoolBulkheadRegistry.of(bulkheadConfig);
ThreadPoolBulkhead bulkhead1 = registry.bulkhead("serviceA");
ThreadPoolBulkhead bulkhead2 = registry.bulkhead("serviceB");

`BulkheadRegistry` and `ThreadPoolBulkheadRegistry` save named configs allowing us to store multiple different configurations under different names:

In [None]:
ThreadPoolBulkheadRegistry registry = ThreadPoolBulkheadRegistry.of(bulkheadConfig);

// Under the hood it is equivalent to:
ThreadPoolBulkheadRegistry registry = ThreadPoolBulkheadRegistry.of(Map.of("default", bulkheadConfig));

// There can be multiple configs associated with same registry
ThreadPoolBulkheadRegistry registry = ThreadPoolBulkheadRegistry.of(
    Map.of("default", bulkheadConfig, "alt", alternativeConfig));
ThreadPoolBulkhead defaultBulkhead = registry.bulkhead("serviceA", "default");
ThreadPoolBulkhead altBulkhead = registry.bulkhead("serviceB", "alt");

There are multiple ways to use created bulkhead, the below example shows bulkhead accepting `Callable`:

In [None]:
// Sempahore based
String value = bulkhead1.executeCallable(() ->
        Database.getData()
);

// Threadpool based
altBulkhead.executeCallable(() -> Database.getData())
    .thenAccept(System.out::println)
    .exceptionally(err -> {
        System.err.println(err.getMessage());
        return null;
    });