Bulkhead

reisenberger edited this page Oct 31, 2016 · 11 revisions

Bulkhead isolation policy (v5.0 onwards)

Purpose

To limit the resources consumable by the governed actions, such that a fault 'storm' cannot cause a cascading failure also bringing down other operations.

Premise: 'One fault shouldn't bring down the whole ship!'

When a process begins to fault, it can build up a large number of requests, all potentially failing slowly in parallel. If unconstrained, these can chew up ever greater resource (CPU/threads/memory etc) in the host, degrading capability or eventually causing outright failure.

In a variant, a faulted downstream system can lead to a 'backing up' of large numbers of requests in its consumers. If ungoverned, these 'backed-up' calls can in turn consume all resources in the consumer, leading to a cascading failure upstream.

A bulkhead is a portion of a ship which can be isolated from others, such that if it fails (holes), the whole ship does not sink.

Similarly, a bulkhead isolation policy assigns operations to constrained resource pools, such that one faulting channel of actions cannot swamp all resource (threads/CPU/whatever) in a system, bringing down other operations with it. The impact of a faulting system is isolated to the resource-pool to which it is limited; other threads/pools/capacity remain to continue serving other calls.

Syntax

BulkheadPolicy bulkhead = Policy
  .Bulkhead(int maxParallelization
           [, int maxQueuingActions]
           [, Action<Context> onBulkheadRejected]);

BulkheadPolicy bulkhead = Policy
  .BulkheadAsync(int maxParallelization
                [, int maxQueuingActions]
                [, Func<Context, Task> onBulkheadRejectedAsync]);

Parameters:

  • maxParallelization: the maximum parallelization of executions through the bulkhead;
  • maxQueuingActions (optional): the maximum number of actions that may be queuing (waiting to acquire an execution slot) at any time.
  • onBulkheadRejected/Async (optional): an action to run, if the bulkhead rejects execution

Throws:

  • BulkheadRejectedException, when an execution is rejected due to bulkhead and queue capacity being exceeded

Operation

A useful way to envisage the policy is that separate bulkheads place calls into separate thread pools of the defined size.

This is not however the current implementation. Instead, as latest Hystrix, the bulkhead is implemented as a max parallelization semaphore on actions through the bulkhead.

When a call is placed, the bulkhead policy:

  • determines if an execution slot is available in the bulkhead, and executes immediately if so
  • if not, determines if there is still space in the queue;
    • when no space remains in the queue, a BulkheadRejectedException is thrown.
    • when space remains in the queue, the execution enters the queue, and blocks (synchonously or asynchronously according to whether the policy is sync/async) until an execution slot can be obtained in the bulkhead.

The policy itself does not place calls onto threads; it assumes upstream systems have already placed calls into threads, but limits their parallelization of execution.

What is the point of the 'queue'?

The bulkhead's primary goal is to act like nightclub bouncer: to ensure that the maximum capacity of the 'club' inside is never exceeded. At the same time, just as for a nightclub bouncer, a secondary goal is to ensure the inside of the club is always at maximise utilisation.

To achieve this, it can make sense to have a queue of 'ready punters' (on the pavement outside the nightclub, if you like), waiting to take execution slots within the bulkhead as soon as one becomes free. This is the maxQueuingActions.

For guidance on setting maxQueuingActions, see Configuration recommendations below.

Interacting with policy operation

OnBulkheadRejected

An optional onBulkheadRejected / onBulkheadRejectedAsync delegate allows specific code to be executed (for example for logging) when the bulkhead rejects an execution.

State

Bulkhead policies expose two state properties, for reporting/health-monitoring:

  • BulkheadAvailableCount: the number of execution slots available in the bulkhead at present
  • QueueAvailableCount: the number of spaces available in the queue for a bulkhead execution slot

Diminishing levels of BulkheadAvailableCount or QueueAvailableCount may be monitored, for use as a custom metric to trigger automated horizontal scaling.

Note: Code such as this is not necessary:

if (bulkhead.BulkheadAvailableCount + bulkhead.QueueAvailableCount > 0)
{
   bulkhead.Execute(...); // place call
} 

It is sufficient to place the call bulkhead.Execute(...), and the bulkhead will decide for itself whether the action can be executed or queued. In addition, users should be clear that the above code does not guarantee the bulkhead will not block the execution: in a highly concurrent environment, state could change between evaluating the if condition and executing the action. However, a code pattern such as above can be used to reduce the number of BulkheadRejectedExceptions thrown while the bulkhead is at capacity, if this is a performance concern.

Configuration recommendations

Configuring maxParallelization - a business and architectural perspective

A bulkhead policy acts both as an isolation unit, and (intentionally) as a load-shedder. To preserve the health of the underlying machine, the bulkhead intentionally sheds load when its capacity and queue are exhausted.

Bulkheads work particularly well when used in combination with some kind of automated horizontal scaling. Ideally you want either to be tolerant of bulkhead rejections (of asking users or processes to 'come back later'); or to use a metric based on bulkhead rejections or diminishing BulkheadAvailableCount/QueueAvailableCount as a trigger for automated horizontal scaling. Both Azure and Amazon cloud services allow the definition of custom metrics as triggers for automated horizontal scaling.

The capacity to set for a bulkhead will depend critically on:

  • whether it governs an I/O-bound or CPU-bound operation
  • what other actions are being supported (which can also be expected to operate simultaneously at load) by the underlying application or host hardware/VM/cloud instance
  • what automated horizontal scaling you have available.

The recommended approach for configuring bulkhead capacity is to configure based on load/saturation-testing in or on a replica of your production environment.

It is important however not to set the bulkhead capacity for an individual operation near the peak manageable load, as if that process was running in isolation. Setting the bulkhead capacity at this level would provide no protection for other processes running simultaneously: the first process would have been permitted (when it faults) precisely to saturate all available resource, degrading other processes.

If your orientation is for maximum resilience at high volumes, and adequate automated horizontal scaling to support this is available, an application running four customer-critical processes, all expected to run simultaneously at load, might, for example, allocate bulkheads restricting these to less than a quarter of the host's capacity for each process in isolation. Such a stability-orientation prefers to trigger horizontal scaling - to reduce overall latency for customers by preserving the health of underlying individual hosts.

Alternatively, you may have an orientation seeking to trade or contain the cost of horizontal scaling, for slightly greater risk, by not restricting each bulkhead capacity quite as severely as the divide-by-four example above suggests, or by sharing a bulkhead across calls. Sharing a BulkheadPolicy instance across calls allows the group of calls to share the capacity: this can provide more flexibility (and more efficient use of resource) if the different calls can be expected, for example, to have different peak hours, at the expense that one stream of calls has more potential to degrade others.

Finally, remember also that partitioning at a software level within the application (as a BulkheadPolicy) is only one level at which isolation for stability may be provided. For instance, you might partition your systems also at the server level, reserving some servers purely for administrative functions, so that there vital administrative functions are always available even if consumer load crashes. See Michael Nygard: Release It! for further isolation patterns.

Configuring maxParallelization - a technical and software perspective

For CPU-bound work (such as, say, resizing uploaded customer images), it makes sense to configure a bulkhead capacity (considering a call in isolation) in close proportion to the number of processors in the host system, just as you would for a Parallel.For. Limiting parallelism close to the number of processors prevents undue context-switching: there is usually a sweet spot for performance at or just above the number of processors in the host.

For async operations governing I/O bound work, the picture is more nuanced. A number of calls at any one time may be in the (non-thread/CPU-consuming) await phase of an async/await, and thus it is recommended to set bulkhead capacity (considering a call in isolation) at a significantly higher level than pure thread capacity. This allows .NET to optimise the use of threads between calls engaged and not engaged in actual activity. Optimum configuration will depend on the amount of time the calls typically spend in await; there is no real short-cut to performance-tuning for the characteristics of your individual system.

This is a feature where we expect users' individual configurations to vary according to the actions they are governing: to help other users, we would be interested to hear your stories.

Configuring maxQueuingActions

maxQueuingActions provides flexibility by allowing you to limit parallelization, but not immediately reject executions. It also helps maximise throughput by providing for next actions to be ready-and-waiting, as soon as a bulkhead execution slot becomes available.

It is advisable however not to set maxQueuingActions high, for the following reasons:

Synchronous case (CPU-bound work)

In the (current) sync implementation, a queueing item will be blocking a thread. For this reason, we recommend setting maxQueuingActions only at 0 or 1 in the current synchronous case.

(Future releases of Polly may explore an alternative policy for scheduling synchronous work, such as a SchedulerPolicy which would schedule work on an underlying TaskScheduler. This would allow sync work to 'queue' outside a bulkhead without occupying a thread, but probably requires a new syntax combining elements of the existing synchronous Execute() and async ExecuteAsync() overloads.)

Asynchronous case (I/O-bound work)

Ideal configuration may depend on the characteristics of the governed calls, and should be established through experiment.

For very low latency, high throughput calls, it may make sense to allow a slightly higher queue level, so that next actions are always immediately available to fill a bulkhead slot as it becomes available. There is little point however in setting very high queue values: calls which queue of course experience higher latency while they wait for bulkhead slots to become available, so queue length should be set just enough to keep the bulkhead supplied at maximum throughput, while not introducing undue latency. Primary focus should be on getting the main capacity of the bulkhead right, and then monitoring for reducing bulkhead capacity or rejected calls, as the trigger for horizontal scaling.

Thread safety and policy reuse

Thread safety

The internal operation of BulkheadPolicy is thread-safe: multiple calls may safely be placed concurrently through a policy instance.

Policy reuse

BulkheadPolicy instances may be re-used across multiple call sites.

When a bulkhead instance is re-used across call sites, the call sites share the capacity of the bulkhead (allowing you to group actions into bulkheads), rather than each receiving a bulkhead of the given capacity.

When reusing policies, use an ExecutionKey to distinguish different call-site usages within logging and metrics.